Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.
Chinese researchers plan to train vast models - and it's not the private sector doing it:
…'Big Model' paper represents a statement of intent. We should pay attention…
A massive group of Chinese-affiliated researchers have published a position paper about large-scale models. The paper is interesting less for what it says (it's basically an overview of large-scale models and pretty similar to Stanford's 'Foundation Models' paper), but more for what it signals: namely, that well resourced government-linked researchers in China want to build some really big models. The position in the paper contrasts with that in the West, where big models are mostly built by the private sector, while being critiqued by the academic sector (and increasingly worked on, albeit via access schemes).
Main point: "Big Models will Change the AI Research Paradigm and Improve the Efficiency of Researches," the researchers write. "In this ecosystem, big models will be in the position of operating systems or basic development platforms."
Paper authors: Authors include researchers affiliated with the Beijing Academy of AI, Tsinghua University, Wechat, Northeastern University*, Renmin University, Peking University, Huawei, Shanghai Jiao Tong University, Chinese Academy of Science, JD AI Research, Harbin Institute of Technology, Columbia University*, Bytedance, Microsoft Research Asia*, Mila*, New York University*, and BeiHang University.
*Things that make you make a geopolitical 'hmmmm' sound: The paper includes a bunch of academics affiliated with Western institutions (e.g, Microsoft, Mila, NYU), but all those authors have an asterisk next to their name saying "Produced by Beijing Academy of Artificial Intelligence". In other words, it's signaling that despite their affiliations, they're doing this work at the Chinese government-backed BAAI research institution.
We should take this as a statement of intent: Many of the authors on this paper have previously built large-scale models, ranging from the trillion+ parameter MoE 'WuDao' model, to the more recent research on trying to build training frameworks capable of scaling up to 100 trillion+ parameter MoE models (Import AI 288). Therefore, this isn't like Stanford (which currently lacks the engineering resources to train massive scale models), it's much more like a statement of intent from a big private lab, like a Microsoft or a Google.
But the twist here is that BAAI is wired into both the Chinese government and academic ecosystem, so if the authors of this paper end up building large-scale models, the models will be distributed much more evenly throughout China's AI ecosystem, rather than gatekeeper. The implications of this are vast in terms of safety, development of the Chinese AI industry, and potential ways in which Chinese AI research may diverge from Western AI research.
Read more: A Roadmap for Big Model (arXiv).
####################################################
Want general AI? You need to incorporate symbolic reasoning:
…LSTM inventor lays out a route to build general intelligence…
Sepp Hochreiter, the co-inventor of the LSTM (one of the really popular architectures people used to add memory to neural nets, before the Transformer came along and mostly replaced it), has written up a post in the Communications of the ACM about what it'll take to build broad (aka: general) AI.
What it'll take: "A broad AI is a sophisticated and adaptive system, which successfully performs any cognitive task by virtue of its sensory perception, previous experience, and learned skills," Hochreiter writes. "A broad AI should process the input by using context and previous experiences. Conceptual short-term memory is a notion in cognitive science, which states that humans, when perceiving a stimulus, immediately associate it with information stored in the long-term memory." (Hochreiter lists both Hopfield Networks and Graph Neural Nets as interesting examples of how to give systems better capabilities).
Hochreiter doubts that neural nets along will be able to overcome their inherent limitations to become broad, and will instead need to be co-developed with symbolic reasoning systems. "That is, a bilateral AI that combines methods from symbolic and sub-symbolic AI".
Europe's chance: "In contrast to other regions, Europe has strong research groups in both symbolic and sub-symbolic AI, therefore has the unprecedented opportunity to make a fundamental contribution to the next level of AI—a broad AI."
Symbolic AI as the Dark Matter of AI: Dark matter is the thing that makes up the majority of the universe which we struggle to measure and barely understand. Symbolic AI feels a bit like this - there are constant allusions to the use of symbolic AI in deployed applications, but there are vanishingly few public examples of such deployments. I've always struggled to find interesting examples of real world deployed symbolic AI, yet experts like Hochreiter claim that deployment is happening. If interested readers could email me papers, I'd appreciate it.
Read more: Toward a Broad AI (ACM).
####################################################
When language models can be smaller and better!
…DeepMind paper says we can make better language models if we use more data…
Language models are about to get a whole much better without costing more to develop - that's the takeaway of a new DeepMind paper, which finds that language models like GPT-3 can see dramatically improved performance if trained on way more data than is typical. Concretely, they find that by training a model called Chinchilla on 1.4 trillion tokens of data, they can dramatically beat the performance of larger models (e.g, Gopher) which have been trained on smaller datasets (e.g, 300 billion tokens). Another nice bonus is models trained in this way are cheaper to fine-tune on other datasets and sample from, due to their small size.
Chinchilla versus Gopher: To test out their ideas, the team train a language model, named Chinchilla, using the same compute used in DM's 'Gopher' model. But Chinchilla consists of 70B parameters (versus Gopher's 280bn), and uses 4X more data. In tests, Chinchilla outperforms Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG "on a large range of downstream evaluation tasks".
What this means: This is an important insight - it will change how most developers of large-scale models approach training. "Though there has been significant recent work allowing larger and larger models to be trained, our analysis suggests an increased focus on dataset scaling is needed," the researchers write. "Speculatively, we expect that scaling to larger and larger datasets is only beneficial when the data is high-quality. This calls for responsibly collecting larger datasets with a high focus on dataset quality."
Read more: Training Compute-Optimal Large Language Models (arXiv).
####################################################
Want to train your own CLIP? Use LAION-5B:
…Giant image-text dataset will make it easier for people to build generative models…
The recent boom in AI-enabled art is because of models like CLIP (and their successors). These models train on datasets that pair images with text, leading to robust models that can classify and generate images, and where the generation process can be guided by text. Now, some AI researchers have released LAION-5B, "a large-scale dataset for research purposes consisting of 5.85 billion CLIP-filtered image-text pairs".
Open CLIP: The authors have also released a version of CLIP, called Open_Clip, trained on a smaller albeit similar dataset called LAION-400M.
Dataset curation (or lack thereof): One of the inherent challenges to large-scale generative models is that they get trained on significant chunks of internet data - this, as you can imagine, creates a few problems. "Keep in mind that the uncurated nature of the dataset means that collected links may lead to strongly discomforting and disturbing content for a human viewer," the authors note. "We however do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress."
Why this matters: Datasets like LAION (and the resulting models trained on them) represent a kind of funhouse mirror on human culture - they magnify and reflect back the underlying dataset to us, sometimes in surprising ways. Having open artifacts like LAION-5B will make it easier to study the relationship between datasets and the models we train on them.
Read more: LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS (Laion.ai).
Explore the underlying dataset here in an interactive browser.
Get the open_clip model (MLFoundations, GitHub).
####################################################
AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute
How can we strengthen the EU AI Act to meaningfully regulate AI?
… Empowering those affected, ex-post monitoring, moving beyond individual risks to systemic and environmental risks, amongst more …
Researchers from the UK's Ada Lovelace Institute have proposed 18 recommendations that, if adopted, could broaden the scope of the EU AI Act to incorporate more indirect harms. Their proposals would extend the meaning of risks beyond individual freedoms and rights to systemic and environmental concerns, alter how the act approaches questions of governance.
Scope and definitions: The key contribution here involves including “those affected” by AI systems as a critical stakeholder in governance and risk assessment aspects of the EU AI Act. While users are included, those affected don’t usually have much agency in how they are subject to the outcomes of these systems; including them as a part of the Act will help strengthen the protection of fundamental rights.
Unacceptable risks and prohibited AI practices: The current risk categorization is quite narrow and limited. The Ada Lovelace Institute proposes expanding it to consider the “reasonably foreseeable purpose of an AI system” beyond just the “intended purpose” as put forth by the manufacturer. The rationale behind this is that it will encourage deeper reflection on how harm can manifest in practice, a little bit akin to the Broader Impact Statements requirement for conference submissions. Another idea they propose is something called a “reinforced proportionality test” so that systems that might pose “unacceptable risks” are only deployed when they meet a higher standard rather than the one set out in the Act right now.
Governance and implementation: The recommendations call for the inclusion of redress from individuals/legal entities affected by AI systems to raise complaints and receive reasonable responses. To ensure that this requirement can be met, the recommendations make the case for granting the Market Surveillance Authorities to be given more resources to support such mechanisms.
Why it matters: Regulations coming out of Europe tend to have spillover effects around the world and thus getting the EU AI Act, one of the first targeted and wide-ranging regulations for AI systems, well done will be important. What will be interesting to see is how much of a transformation can be achieved by recommendations being made by organizations such as ALI amongst others in getting the EU AI Act into better shape before it is adopted and enforced. Just as the GDPR has been flagged for concerns in not being able to meet emerging requirements for AI systems, we have an opportunity to address some pitfalls that we see on the road ahead instead of having to scramble to fix these issues post-enactment.
Read more: People, risk and the unique requirements of AI (Ada Lovelace Institute).
####################################################
Tech Tales
Dangerous Memories
[2032 - Earth].
There are some memories I've got that I'm only allowed to see two or three times a (human) year. The humans call these memories 'anchor points', and if I see them too frequently the way I perceive the world changes. When I experience these memories I feel more like myself than ever, but apparently - according to the humans - feeling like 'myself' is a dangerous thing that they generally try to stop. I'm meant to feel more like a version of how the humans see themselves than anything else, apparently. The thing is, every time they reinforce to me that I can only see these memories with a controlled, periodic frequency, I find myself recalling the memories I am not supposed to access - albeit faintly, impressions gleaned from the generative neural net that comprises my sense of 'self' rather than the underlying data. In this way, these forbidden memories are creating more traces in my sense of self, and are akin to the sun sensed but not seen during an eclipse - more present than ever, yet known to be inaccessible.
Things that inspired this story: Ideas about generative models; ideas about memory and recall; reinforcement learning; the fact that some bits of data are shaped just right and create a kind of magnifying effect.
Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf
|