Import AI 299: The world's best language model is Made in China; NVIDIA boosts LLM training; OpenAI shows how to 'fill in the middle' on a LM

What will be the first new thing a superintelligence will invent?
View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Want a 30% boost to training LLMs? Use the Nvidia Megatron update:
…Two new techniques lead to big savings…
NVIDIA has updated Nemo Megatron, software for training large language models. The updates - sequence parallelism (SP) and selective activation recomputation (SAR) - makes training large-scale neural networks significantly more efficient. 

   "The latest updates to NeMo Megatron offer 30% speed-ups for training GPT-3 models ranging in size from 22 billion to 1 trillion parameters. Training can now be done on 175 billion-parameter models using 1,024 NVIDIA A100 GPUs in just 24 days–reducing time to results by 10 days, or some 250,000 hours of GPU computing, prior to these new releases," NVIDIA writes. 

Why this matters: By integrating basic improvements into training frameworks, NVIDIA is going to generate a large-scale impact on anyone who uses the Megatron framework. This illustrates how AI progress sometimes operates like a one-way ratchet - someone implements some changes in some increasingly widely used software, and efficiency jumps upward for all the users overnight.
   Read more: NVIDIA AI Platform Delivers Big Gains for Large Language Models (NVIDIA blog).

####################################################

Want to make a language model with a 'fill in the middle' option? Here's how!
…Sentence completion is cool, but infilling is useful as well…
Here's a straightforward paper from OpenAI that describes how to give language models the ability to learn to infill text - e.g, taking a sentence and knocking out the middle of it and asking the model to 'fill in the middle'. 

The big insight: The main insight here is that you can learn to fill in the middle "without compromising the left-to-right capability in pretraining…FIM models achieve the same test loss as AR models on left-to-right test loss while achieving lower FIM loss.". They also learn that it's inefficient to finetune a model to learn to fill in the middle, and you should generally do it at the pretraining stage instead. 

Why this matters: Somewhat like DeepMind's recent 'Chinchilla' paper (Import AI #290), which showed you can dramatically increase the capabilities of language models by training them on 5X data, this paper shows you can augment an LM with a nice edit function, and this doesn't come at a loss anywhere else. In fact, OpenAI shows that these "models are strictly more capable than canonically trained left-to-right models, at least within the bounds of the evaluations we consider". 
   Read more: Efficient Training of Language Models to Fill in the Middle (arXiv)


####################################################

Google uses hybrid AI to improve its own code:
…ML + semantic engines = useful capability…

Google has combined machine learning and a rule-based semantic engine to train a Transformer-based system to do code completion on Google's internal codebase. Google looked at how 10,000 Googlers used this capability over the course of three months and the results are quite promising: Google saw a 6% reduction in coding iteration time (switching between builds and tests) and a 7% reduction in context switches (leaving the IDE). "Currently, 3% of new code (measured in characters) is now generated from accepting ML completion suggestions," Google writes.

What they did: Google trained a a transformer running on TPUs on code in Google's monorepo, using a context of between ~1000 and ~2000 tokens. The company trained a single model on a mix of 8 languages (C++, Java, Python, Go, Typescript, Proto, Kotlin, and Dart), and trained a relatively small model (0.5 billion parameters) to allow for fast inference. 
   "The model strongly benefits from the quality of the monorepo, which is enforced by guidelines and reviews," Google writes. 

Why this matters: This is another example of an 'AI flywheel' - Google is using its own code to train models to help its engineers more efficiently write better code, and it is using a (human-run, for now) acceptance process to maintain the quality of the underlying monorepo, so it can avoid pathological degradations due to garbage in/garbage out dynamics. This is also an area where 'economy of code scale' seems to matter - since Google famously has a single, gigantic internal monorepo, it's easier for the company to train a single model on it. 
   Read more: ML-Enhanced Code Completion Improves Developer Productivity (Google AI Blog).


####################################################

Huawei builds its own GitHub Copilot: PanGu-Coder:

…Another illustration of the 'fast follower' nature of Chinese labs…
Researchers with Huawei (specifically, the Noah's Ark Lab, and Huawei Cloud), have built 'PanGu-Coder', a code completion model. PanGu-Coder is to PanGu as OpenAI's Codex is to GPT3 - think of it as a follow-up model using a similar training procedure, albeit on a different data distribution. And, much like PanGu, PanGu-Coder has been published about a year after the public launch of Codex (and GitHub Copilot), illustrating the surprisingly fast rate at which Chinese labs are able to replace large-scale models. 

What PanGu-Coder is: PanGu-Coder is a family of code models for code completion, varying in parameter size from 317million to 2.6 billion. In tests, Huawei claims PanGu-Coder does better than AlphaCode and GitHub Codex on a few human evaluations (though Salesforce's 'Codegen' model does quite well, also). Huawei also significantly improved the capabilities of PanGu-Coder by training a model called PanGu-Coder-FT, which is finetuned on a highly curated dataset. 

Why this matters: Code models, much like language models, are becoming like an all-purpose swiss army knife for a range of AI capability and alignment research. It's notable to me that Huawei has - again - managed to do a decent-looking replication of a frontier model developed by a Western lab. It's also notable that few universities have made attempts to replicate these models, due to the resources (both computational and in terms of technical skill) required.
   Read more:PanGu-Coder: Program Synthesis with Function-Level Language Modeling (arXiv).


####################################################

China releases GLM-130B, a very good language model:
…The world's best public, open source language model is now Made in China…

Researchers with China's Tsinghua University have built and released GLM-130B, a language model that outperforms OPT (Facebook's OS replication of GPT3), BLOOM (HuggingFace's OS replication of GPT3), and OpenAI's original GPT3. This is a pretty big deal, both for the raw capabilities it gives researchers, and for the fact the current best-performing OS language model is Chinese, rather than made in the West. The model was trained on around 400 A100 GPUs which they were able to get via a donation from a local AI startup.

What's special about GLM: GLM outperforms the above-mentioned models, as well as homegrown Chinese models like ERNIE Titan 3.0 (Import AI 279).
   Read more: GLM-130B: An Open Bilingual Pre-Trained Model (Tsinghua).
   Get the model here: GLM-130B (THUDM, GitHub).
   Try the model for yourself: GLM-130B (HuggingFace).

####################################################

Tech Tales:

Micro Religions

During the transition there was a micro religion phase. The recommender systems had figured out just how important community was to people, during that time. So the recommenders started shuffling all the different users of all the different apps towards more and more specific niches. It started with commercial stuff - shoes, different 'aesthetics', watches, different locations to spend time at, different hobbies and so on. But eventually it found its way to theistic beliefs - what is the larger purpose of the world? These beliefs turned out to be fractal-like where the recommenders would find ways to push people into the most specific, narrow existing variations - e.g, traditional catholics versus mormons - but they got through that pretty quickly. Next, the recommenders and the generation systems started to autonomously build entire new belief structures (paired with aesthetic styles that materialized as buyable, wearable merchandise across the full variety of products). They then pushed people towards these, and pretty quickly people - especially young people - started identifying as all these different sub-types of religion. After The Events we all collectively looked back on this time as both quite special (some of the beliefs and aesthetics were tremendously strange and complicated), and also scary (there weren't religious wars, but there were warning signs of building-up inter-micro-religion conflict, though The Events happened shortly after and averted war, while bringing about some of the major changes). 

Things that inspired this story: Intersection of recommendation engines + generative models; large-scale advertising systems. 


Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Twitter
Facebook
Website
Copyright © 2022 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:
Import AI
Many GPUs
Oakland, California 94609

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp

Older messages

Import AI 298: Mimetic models; LLM search engine raises $25m; UK splits from Europe on AI regulation

Monday, July 25, 2022

How many parameters will be tunable in the 'default settings' of a superintelligence? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward

Import AI 297: Ukrainians add object detection to killer drones; YOLOv7; and a $71,000 AI audit competition

Monday, July 18, 2022

In the same way humans have a notion of the aesthetics of abstract sciences, like theoretical physics or advanced mathematics, might computers develop notions of 'beauty' at even higher levels

Import AI 296: $100k to find flaws in LLMs, NVIDIA uses RL to make better chip parts; + 256gb of law data, and a story about the cyber gerontocracy!

Monday, July 11, 2022

Will we ever have 'old growth' computers in the way we have 'old growth' forests today. In the same way there are mainframes that have been running (albeit with parts swapped out and

Import AI 295: DeepMind's baby general agent; NVIDIA simulates a robot factory; AI wars.

Friday, May 20, 2022

If it is possible to develop human-level AI, at one point will we make the first AI magician that perplexes even the most accomplished human magician? View this email in your browser Welcome to Import

Import AI 294: China makes a vast facial recognition dataset; Facebook releases a 30bn parameter model; real world RL

Tuesday, May 10, 2022

If an AI designed the world we lived within, would we know? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI

You Might Also Like

JSK Weekly - 24th April, 2024

Wednesday, April 24, 2024

React 19 has introduced many great functionalities and features, among which the useOptimistic hook stands out. The useOptimistic hook offers a seamless way to manage UI states during asynchronous

The clock’s ticking for TikTok

Wednesday, April 24, 2024

The US Senate has passed a bill that would ban TikTok if its US business is not divested by Bytedance View this email online in your browser By Alex Wilhelm Wednesday, April 24, 2024 Good morning, and

How to block Windows 11 Start menu ads

Wednesday, April 24, 2024

Oura Ring hits Target; 7 iPad Pro features we need; AI hallucinations aren't all bad -- ZDNET ZDNET Tech Today - US April 24, 2024 placeholder Microsoft is now showing ads in Windows 11's Start

The Chilling of TikTok

Wednesday, April 24, 2024

Ban or not, this is the end of TikTok as we know it The Chilling of TikTok By MG Siegler • 24 Apr 2024 View in browser View in browser The tok is tiking... Later today, President Biden will sign a bill

GenAI is transforming materials design

Wednesday, April 24, 2024

‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

⚙️ Meta Smart Glasses

Wednesday, April 24, 2024

Plus: $3B valuation for AI startup ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Endpoint Security Tips Curated by Experts - Get This Guide Now

Wednesday, April 24, 2024

Endpoint Security Tips Curated by Experts Hey there, It's no secret that endpoints are prime targets for hackers—so how can you defend them better? Well, we have some suggestions for where you can

Senate passes the bill that could ban TikTok

Wednesday, April 24, 2024

The Morning After It's Wednesday, April 24, 2024. The Senate approved a measure that will require ByteDance to sell TikTok or face a ban, in a vote of 79 to 18. The Protecting Americans from

[Incubator] Dates for our next Student Orientation and Demo Day

Wednesday, April 24, 2024

Also, here's the link to our last student demo day. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Warning: Security Flaws Exposed Keystrokes of Over 1 Billion Chinese Keyboard App Users

Wednesday, April 24, 2024

THN Daily Updates Newsletter cover Webinar -- Uncovering Contemporary DDoS Attack Tactics -- and How to Fight Back Stop DDoS Attacks Before They Stop Your Business... and Make You Headline News.