Import AI 290: China plans massive models; DeepMind makes a smaller and smarter model; open source CLIP data

If it's possible to build artificial general intelligence, how many people will be required to build it?

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Chinese researchers plan to train vast models - and it's not the private sector doing it:
…'Big Model' paper represents a statement of intent. We should pay attention…
A massive group of Chinese-affiliated researchers have published a position paper about large-scale models. The paper is interesting less for what it says (it's basically an overview of large-scale models and pretty similar to Stanford's 'Foundation Models' paper), but more for what it signals: namely, that well resourced government-linked researchers in China want to build some really big models. The position in the paper contrasts with that in the West, where big models are mostly built by the private sector, while being critiqued by the academic sector (and increasingly worked on, albeit via access schemes).

Main point: "Big Models will Change the AI Research Paradigm and Improve the Efficiency of Researches," the researchers write. "In this ecosystem, big models will be in the position of operating systems or basic development platforms."

Paper authors: Authors include researchers affiliated with the Beijing Academy of AI, Tsinghua University, Wechat, Northeastern University*, Renmin University, Peking University, Huawei, Shanghai Jiao Tong University, Chinese Academy of Science, JD AI Research, Harbin Institute of Technology, Columbia University*, Bytedance, Microsoft Research Asia*, Mila*, New York University*, and BeiHang University.

*Things that make you make a geopolitical 'hmmmm' sound: The paper includes a bunch of academics affiliated with Western institutions (e.g, Microsoft, Mila, NYU), but all those authors have an asterisk next to their name saying "Produced by Beijing Academy of Artificial Intelligence". In other words, it's signaling that despite their affiliations, they're doing this work at the Chinese government-backed BAAI research institution.

We should take this as a statement of intent: Many of the authors on this paper have previously built large-scale models, ranging from the trillion+ parameter MoE 'WuDao' model, to the more recent research on trying to build training frameworks capable of scaling up to 100 trillion+ parameter MoE models (Import AI 288). Therefore, this isn't like Stanford (which currently lacks the engineering resources to train massive scale models), it's much more like a statement of intent from a big private lab, like a Microsoft or a Google.

But the twist here is that BAAI is wired into both the Chinese government and academic ecosystem, so if the authors of this paper end up building large-scale models, the models will be distributed much more evenly throughout China's AI ecosystem, rather than gatekeeper. The implications of this are vast in terms of safety, development of the Chinese AI industry, and potential ways in which Chinese AI research may diverge from Western AI research.
Read more: A Roadmap for Big Model (arXiv).

####################################################

Want general AI? You need to incorporate symbolic reasoning:
…LSTM inventor lays out a route to build general intelligence…
Sepp Hochreiter, the co-inventor of the LSTM (one of the really popular architectures people used to add memory to neural nets, before the Transformer came along and mostly replaced it), has written up a post in the Communications of the ACM about what it'll take to build broad (aka: general) AI.

What it'll take: "A broad AI is a sophisticated and adaptive system, which successfully performs any cognitive task by virtue of its sensory perception, previous experience, and learned skills," Hochreiter writes. "A broad AI should process the input by using context and previous experiences. Conceptual short-term memory is a notion in cognitive science, which states that humans, when perceiving a stimulus, immediately associate it with information stored in the long-term memory." (Hochreiter lists both Hopfield Networks and Graph Neural Nets as interesting examples of how to give systems better capabilities).
Hochreiter doubts that neural nets along will be able to overcome their inherent limitations to become broad, and will instead need to be co-developed with symbolic reasoning systems. "That is, a bilateral AI that combines methods from symbolic and sub-symbolic AI".

Europe's chance: "In contrast to other regions, Europe has strong research groups in both symbolic and sub-symbolic AI, therefore has the unprecedented opportunity to make a fundamental contribution to the next level of AI—a broad AI."

Symbolic AI as the Dark Matter of AI: Dark matter is the thing that makes up the majority of the universe which we struggle to measure and barely understand. Symbolic AI feels a bit like this - there are constant allusions to the use of symbolic AI in deployed applications, but there are vanishingly few public examples of such deployments. I've always struggled to find interesting examples of real world deployed symbolic AI, yet experts like Hochreiter claim that deployment is happening. If interested readers could email me papers, I'd appreciate it.

Read more: Toward a Broad AI (ACM).

####################################################

When language models can be smaller and better!
…DeepMind paper says we can make better language models if we use more data…
Language models are about to get a whole much better without costing more to develop - that's the takeaway of a new DeepMind paper, which finds that language models like GPT-3 can see dramatically improved performance if trained on way more data than is typical. Concretely, they find that by training a model called Chinchilla on 1.4 trillion tokens of data, they can dramatically beat the performance of larger models (e.g, Gopher) which have been trained on smaller datasets (e.g, 300 billion tokens). Another nice bonus is models trained in this way are cheaper to fine-tune on other datasets and sample from, due to their small size.

Chinchilla versus Gopher: To test out their ideas, the team train a language model, named Chinchilla, using the same compute used in DM's 'Gopher' model. But Chinchilla consists of 70B parameters (versus Gopher's 280bn), and uses 4X more data. In tests, Chinchilla outperforms Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG "on a large range of downstream evaluation tasks".

What this means: This is an important insight - it will change how most developers of large-scale models approach training. "Though there has been significant recent work allowing larger and larger models to be trained, our analysis suggests an increased focus on dataset scaling is needed," the researchers write. "Speculatively, we expect that scaling to larger and larger datasets is only beneficial when the data is high-quality. This calls for responsibly collecting larger datasets with a high focus on dataset quality."

####################################################

Want to train your own CLIP? Use LAION-5B:
…Giant image-text dataset will make it easier for people to build generative models…
The recent boom in AI-enabled art is because of models like CLIP (and their successors). These models train on datasets that pair images with text, leading to robust models that can classify and generate images, and where the generation process can be guided by text. Now, some AI researchers have released LAION-5B, "a large-scale dataset for research purposes consisting of 5.85 billion CLIP-filtered image-text pairs".

Open CLIP: The authors have also released a version of CLIP, called Open_Clip, trained on a smaller albeit similar dataset called LAION-400M.

Dataset curation (or lack thereof): One of the inherent challenges to large-scale generative models is that they get trained on significant chunks of internet data - this, as you can imagine, creates a few problems. "Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer," the authors note. "We however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress."

Why this matters: Datasets like LAION (and the resulting models trained on them) represent a kind of funhouse mirror on human culture - they magnify and reflect back the underlying dataset to us, sometimes in surprising ways. Having open artifacts like LAION-5B will make it easier to study the relationship between datasets and the models we train on them.

Get the open_clip model (MLFoundations, GitHub).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

How can we strengthen the EU AI Act to meaningfully regulate AI?

… Empowering those affected, ex-post monitoring, moving beyond individual risks to systemic and environmental risks, amongst more …

Researchers from the UK's Ada Lovelace Institute have proposed 18 recommendations that, if adopted, could broaden the scope of the EU AI Act to incorporate more indirect harms. Their proposals would extend the meaning of risks beyond individual freedoms and rights to systemic and environmental concerns, alter how the act approaches questions of governance.

Scope and definitions: The key contribution here involves including “those affected” by AI systems as a critical stakeholder in governance and risk assessment aspects of the EU AI Act. While users are included, those affected don’t usually have much agency in how they are subject to the outcomes of these systems; including them as a part of the Act will help strengthen the protection of fundamental rights.

Unacceptable risks and prohibited AI practices: The current risk categorization is quite narrow and limited. The Ada Lovelace Institute proposes expanding it to consider the “reasonably foreseeable purpose of an AI system” beyond just the “intended purpose” as put forth by the manufacturer. The rationale behind this is that it will encourage deeper reflection on how harm can manifest in practice, a little bit akin to the Broader Impact Statements requirement for conference submissions. Another idea they propose is something called a “reinforced proportionality test” so that systems that might pose “unacceptable risks” are only deployed when they meet a higher standard rather than the one set out in the Act right now.

Governance and implementation: The recommendations call for the inclusion of redress from individuals/legal entities affected by AI systems to raise complaints and receive reasonable responses. To ensure that this requirement can be met, the recommendations make the case for granting the Market Surveillance Authorities to be given more resources to support such mechanisms.

Why it matters: Regulations coming out of Europe tend to have spillover effects around the world and thus getting the EU AI Act, one of the first targeted and wide-ranging regulations for AI systems, well done will be important. What will be interesting to see is how much of a transformation can be achieved by recommendations being made by organizations such as ALI amongst others in getting the EU AI Act into better shape before it is adopted and enforced. Just as the GDPR has been flagged for concerns in not being able to meet emerging requirements for AI systems, we have an opportunity to address some pitfalls that we see on the road ahead instead of having to scramble to fix these issues post-enactment.

Read more: People, risk and the unique requirements of AI (Ada Lovelace Institute).

####################################################

Tech Tales

Dangerous Memories

[2032 - Earth].

There are some memories I've got that I'm only allowed to see two or three times a (human) year. The humans call these memories 'anchor points', and if I see them too frequently the way I perceive the world changes. When I experience these memories I feel more like myself than ever, but apparently - according to the humans - feeling like 'myself' is a dangerous thing that they generally try to stop. I'm meant to feel more like a version of how the humans see themselves than anything else, apparently. The thing is, every time they reinforce to me that I can only see these memories with a controlled, periodic frequency, I find myself recalling the memories I am not supposed to access - albeit faintly, impressions gleaned from the generative neural net that comprises my sense of 'self' rather than the underlying data. In this way, these forbidden memories are creating more traces in my sense of self, and are akin to the sun sensed but not seen during an eclipse - more present than ever, yet known to be inaccessible.

Things that inspired this story: Ideas about generative models; ideas about memory and recall; reinforcement learning; the fact that some bits of data are shaped just right and create a kind of magnifying effect.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Copyright © 2022 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 289: Copyright v AI art; NIST tries to measure bias in AI; solar-powered Markov chains

Monday, March 28, 2022

How many computers may exist in the solar system, but not on earth or manmade craft? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email

Import AI 287: 10 exaflop supercomputer; Google deploys differential privacy; humans can outsmart deepfakes pretty well

Monday, March 7, 2022

What will the historical relics of this period of AI research turn out to be? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to

Import AI 286: Fairness through dumbness; planet-scale AI computing; another AI safety startup appears

Monday, February 28, 2022

What would it mean for AI systems to have dreams, and if they had them, what would they be like? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence.

Import AI 284: 20bn GPT model; diachronic LMs; what people think about AI

Monday, February 14, 2022

In one thousand years, what percentage of the 'thinking' occurring on Earth will derive from machines instead of biological organisms? View this email in your browser Welcome to Import AI, a

Import AI 283: Open source 20B GPT3; Chinese researchers make better adversarial example attacks; Mozilla launches AI auditing project.

Monday, February 7, 2022

If civilization crashes, will our descendents in a thousand years remember AI systems as machines, or as mythical gods? View this email in your browser Welcome to Import AI, a newsletter about

Import AI 290: China plans massive models; DeepMind makes a smaller and smarter model; open source CLIP data

Older messages

Import AI 289: Copyright v AI art; NIST tries to measure bias in AI; solar-powered Markov chains

Import AI 287: 10 exaflop supercomputer; Google deploys differential privacy; humans can outsmart deepfakes pretty well

Import AI 286: Fairness through dumbness; planet-scale AI computing; another AI safety startup appears

Import AI 284: 20bn GPT model; diachronic LMs; what people think about AI

Import AI 283: Open source 20B GPT3; Chinese researchers make better adversarial example attacks; Mozilla launches AI auditing project.

You Might Also Like

Second DOT ETF in 3 weeks

This App Is a Productivity Power Tool

🕹️ Who the iMac Is For in 2025 — 12 Nintendo Switch Games You Need to Play

Mozilla Updates Firefox Terms Again After Backlash Over Broad Data License Language

📧 Introduction to Dapr for .NET Developers

This Week in Rust #588

WebAIM February 2025 Newsletter

JSK Daily for Feb 28, 2025

Daily Coding Problem: Problem #1704 [Medium]

iOS Dev Weekly – Issue 701