Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.
Google (finally) reveals its big text model - LaMDA:
…Plus: why you need a lot of humans to make a language model safe…
Google has finally given details on LaMDA, it's GPT-3 competitor. LaMDA are a family of language models ranging in size from 2B to 137B parameters (GPT3: 175B), and have been trained on a massive dataset of 1.56 trillion words. One way LaMDA is different to other big language models is that it is centered around dialogue, with 50% of its data coming from "dialogues from public forums". Google has also fine-tuned LaMDA models to make them better, safer, and more likely to give factual responses - all these fine-tuned versions are facilitate by Google using crowdworkers to gather specific dialog data.
The dataset: LaMDA's underlying dataset is called 'Infiniset', and besides the dialogue also involves common crawl, wikipedia, a mixture of english and non-english web documents, and data from programming-related sites (so LaMDA models can also dabble in code).
Learning to use an external dataset: Similar to DeepMind's 'RETRO' model, LaMDA has also been taught to use an external information retrieval system - this makes LaMDA models more factually correct than models without external datasets, because the model can check facts from an external, trusted source. To train the model, they also collect 9k dialogs where LaMDA's generated answers are labelled 'correct' or 'incorrect', to give it a stronger training signal.
The results of this are quite striking. For example, when you ask a non-information-retrieval LaMDA model "What do you think of Rosalie Gascoigne’s sculptures?", it replies: "LAMDA-BASE: They’re great, and I love how her work changed through her life. I like her later work more than her earlier ones. Her influence is also super interesting - did you know she was one of the artists that inspired Miró?". This response seems plausible but it's wrong - Miro was active till late 1960s, and Gascoigne first solo exhibits were in 1970s, so she couldn't have been an influence on Miro.
By comparison, the version of LaMDA trained on information retrieval says this: "LAMDA: Oh wow, her life course is so inspiring. Did you know she was a practitioner of Japanese flower arrangement before turning to sculpture? https://artsearch.nga.gov.au/detail.cfm?irn=8774&pictaus=true", where it gives a factually accurate statement and provides a source as well.
Things that make you go 'hmmm' - more compute than GPT-3: LaMDA consumed 3.55E+23 flops during training, versus 3.14+23 flops for GPT3 (so more parameters doesn't necessarily mean more resource intensive). It was trained on a cluster of 1024 TPU V3 chips.
Why this matters: "LaMDA is a step closer to practical and safe open-ended dialog systems, which can in turn unlock a wide range of useful applications. We hope that this work encourages further research in this area", Google writes. This is true - systems like LaMDA are basically refinements and improvements on the ideas of GPT2/3. We're a few years away from everyone having access to vast, planet-scale AI models that tell them truthful things in natural ways - the proverbial angel (or devil) on everyone's shoulder. The cultural impacts will be vast and destabilizing.
Read more: LaMDA: Language Models for Dialogue Applications (arXiv).
####################################################
Write about a world where AI goes well, and win (part of) $100k:
…Future of Life Institute's worldbuilding contest tries to imagine positive AGI rollouts…
The Future of Life Institute is launching a competition based around "designing visions of a plausible, aspirational future that includes strong artificial intelligence." The competition deadline is April 15th 2022. The idea here is that if we can figure out realistic ways in which powerful AI can go well, then that gives us a map to use to get civilization there. The first prize is $20,000, followed by two second prizes of $10,000 each, and smaller prizes.
Find out more about the competition here (Worldbuild.ai, FLI site).
####################################################
Want to teach your drone to see? Use this massive dataset:
…WebUAV-3M is probably the largest public UAV tracking dataset…
Researchers with the Chinese Academy of Sciences, the Shenzhen Research Institute of Big Data, and the Chinese University of Hong Kong Shenzhen, have built WebUAV-3M, a large dataset to help people teach drones to accurately label images and videos. WebUAV-3M consists of 4,485 videos, where each one has been labeled with dense bounding boxes that cover 216 distinct categories of object to be tracked (e.g, bears, wind turbines, bicycles, etc). The authors claim this is "by far the largest public UAV tracking benchmark".
Multimodal: Unusually, this is a multi-modal dataset; each labeled video is accompanied by a natural language sentence describing the video, as well as an audio description of it. "We provide natural language specifications and audio descriptions to facilitate multi-modal deep UAV tracking," the authors write. "The natural language specification can provide auxiliary information to achieve accurate tracking".
Why this matters: In the same way CCTV cameras have instrumented the streets of cities around the world, drones are doing the same for cities and rural areas. And just like how increasingly good AI got trained on datasets gathered by CCTV cameras, we can expect the same for drones. The result? An ever-expanding suite of surveillance capabilities that we can expect will be integrated, for good and bad purposes, by a broad range of governments and private sector actors. Datasets like WebUAV-3M are the fuel for this.
Read more: WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking (arXiv).
Get the code from here (eventually - wasn't online when I wrote this section this week).
####################################################
FFCV: Train ImageNet for 98 cents!
…What's this? Free software that makes all model training better? Interesting!...:
There's some new software that could help pretty much everyone train models more efficiently. The software is called FFCV, short for Fast Forward Computer Vision, and it is a "drop-in data loading system that dramatically increases data throughput in model training". It looks like a potentially big deal - FFCV can be much more efficient for training AI models, according to tests done by the authors, and may also work for other applications as well. "FFCV can speed up a lot more beyond just neural network training---in fact, the more data-bottlenecked the application (e.g., linear regression, bulk inference, etc.), the faster FFCV will make it!," says the project's GitHub page.
Why this matters: Software like FFCV is part of the broader industrialization of AI - now we know how to train networks, various people are modularizing the training process and perfecting different elements of it. Stuff like FFCV is part of that trend.
Find out more and get the code: FFCV GitHub repo.
Get more details by reading the Performance Guide (FFCV site).
Check out the main project website here (FFCV site).
####################################################
Microsoft makes MoEs easier to train:
…MoEs might be the best way to scale-up large models…
Microsoft has given a technical update on how it's trying to scale-up mixture-of-experts (MoE) networks. MoEs are one of the more promising routes for creating trillion-parameter-plus AI models, as MoEs are a lot more efficient to train than dense models like GPT3. In this paper, Microsoft talks about how it has made some tweaks so MoEs work well for auto-regressive natural language generation tasks, "demonstrating training cost reduction of 5X to achieve same model quality for models like GPT-3" and Microsoft's own 530B parameter 'Megatron-Turing NLG'.
MoEs might be cheaper and better: In tests, Microsoft shows that it can train 350M and 1.3B parameter MoE text models that have better (or the same) performance as GPT3 for a range of different tasks.Microsoft says this nets out to models with the "same quality with 5X less training cost".
Why this matters: MoEs could turn out to be the main way people break the trillion-parameter barrier (and there are rumors that China's 'Wu Dao' MoE at an alleged ~1.7 trillion parameters has already done this). Via efficient MoE training and inference software, "a model with comparable accuracy as trillion-parameter dense model can be potentially trained at the cost of a 200B parameter (like GPT-3) sized dense model, translating to millions of dollars in training cost reduction and energy savings", Microsoft says.
Read more: DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale (arXiv).
####################################################
Backchain science out of fictional news - and win a hundred bucks:
What could cause a computer virus to infect a biological organism? Or how might a biological organism evolve into a computer virus? These are the two questions posed by a 'Fiction Science Competition'. Entrants will need to write a plausible scientific explanation for how either of the above scenarios could transpire, and will respond to a short (fictionalized) news article written about the scenarios. There's a prize of $100 dollars for winning entries, and submissions close February 28th 2022.
Find out more here at the official Fiction Science Contest website.
####################################################
AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute
Visual surveillance’s share in computer vision research across the world shows some worrying trends … Research coming out of China dominates the field, especially in emergent surveillance sub-areas like person re-identification, crowd counting, and facial spoofing detection …
CSET researchers have identified trends in computer vision research by looking for patterns of publication for six distinct tasks, analyzing 100 million English publications that were published between 2015-2019.
Surveillance tasks examined: A SciREX model trained on data from Papers with Code was used to identify references to the following six tasks: face recognition, person re-identification, action recognition, emotion, recognition, crowd counting, and facial spoofing detection.
Some key findings: Facial recognition was the most well-established task over this period, and crowd counting and face spoofing detection were rapidly growing areas. The overall percentage share of surveillance papers has remained stable around 5.5% over this period, though the raw volume of papers has grown given the surge in computer vision research overall. During this time period, China’s share of global CV papers grew from 33 to 37% and surveillance papers from 36% to 42%, exceeding research from the EU (2nd) and the US (3rd) by more than 20% in each category.
Why it matters: While dual-use technologies developed in one part of the world can be used elsewhere, such analyses reveal a nation’s primary interest and provide quantitative evidence for decision-making in policy. The identified areas are important since tasks like action recognition can detect individuals with abnormal behavior in crowds, emotion recognition can help identify security threats in public areas, crowd counting can help to monitor civilian protests, and face spoofing detection can prevent journalists and activists from hiding their identity. All of these have significant implications in terms of fundamental rights and freedoms of people.
Read more: Trends in AI Research for the Visual Surveillance of Populations
####################################################
Tech Tales:
VHS vs Betamax
[An online forum, 2035]
"Alright I need you to livestream from your phone what's happening on the computer, and I'm gonna send you an image to use as a prior, then I'm gonna watch it generate the first few epochs. If everything checks out I'll authorize the transfer to the escrow service and you'll do the same?"
"Yes," wrote the anonymous person.
I sent them a seed picture - something I'd drawn a couple of years ago that had never been digitized.
They turned on their livestream and I watched as the ML pipeline booted up and started the generation process. It seemed legit. Some of these older models had a very particular style that you could ID during early generation. I watched for a few minutes and was satisfied. This was the final authentication step and the only way I'd know for certain is if I just took a leap of faith and paid up.
"Okay, I'm sending the funds to the escrow service. They'll be distributed to your account once the service confirms receipt of the model."
"Excellent. Good doing business with you."
And then their little green dot went out and they were gone.
A few minutes passed, and then the escrow service pinged me confirming they'd received the model. I downloaded it, then stuck it in my pipeline and started generating the client orders. People paid a lot of money for these kinds of 'vintage' AI-generated objects, and the model I'd just got was very old and very notorious.
Just another beautiful day in America, sifting through all the debris of decades of software, panning for little chunks of gold.
Things that inspired this: How the flaws of a media system ultimately become desired or fetishized aesthetic attributes - and specifically, this amazing Brian Eno quote; how models like CLIP will one day be obscure; how models vary over their development lifespans, creating the possibility of specific aesthetics and tastes.
Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf
|