Import AI 299: The world's best language model is Made in China; NVIDIA boosts LLM training; OpenAI shows how to 'fill in the middle' on a LM

What will be the first new thing a superintelligence will invent?
View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Want a 30% boost to training LLMs? Use the Nvidia Megatron update:
…Two new techniques lead to big savings…
NVIDIA has updated Nemo Megatron, software for training large language models. The updates - sequence parallelism (SP) and selective activation recomputation (SAR) - makes training large-scale neural networks significantly more efficient. 

   "The latest updates to NeMo Megatron offer 30% speed-ups for training GPT-3 models ranging in size from 22 billion to 1 trillion parameters. Training can now be done on 175 billion-parameter models using 1,024 NVIDIA A100 GPUs in just 24 days–reducing time to results by 10 days, or some 250,000 hours of GPU computing, prior to these new releases," NVIDIA writes. 

Why this matters: By integrating basic improvements into training frameworks, NVIDIA is going to generate a large-scale impact on anyone who uses the Megatron framework. This illustrates how AI progress sometimes operates like a one-way ratchet - someone implements some changes in some increasingly widely used software, and efficiency jumps upward for all the users overnight.
   Read more: NVIDIA AI Platform Delivers Big Gains for Large Language Models (NVIDIA blog).

####################################################

Want to make a language model with a 'fill in the middle' option? Here's how!
…Sentence completion is cool, but infilling is useful as well…
Here's a straightforward paper from OpenAI that describes how to give language models the ability to learn to infill text - e.g, taking a sentence and knocking out the middle of it and asking the model to 'fill in the middle'. 

The big insight: The main insight here is that you can learn to fill in the middle "without compromising the left-to-right capability in pretraining…FIM models achieve the same test loss as AR models on left-to-right test loss while achieving lower FIM loss.". They also learn that it's inefficient to finetune a model to learn to fill in the middle, and you should generally do it at the pretraining stage instead. 

Why this matters: Somewhat like DeepMind's recent 'Chinchilla' paper (Import AI #290), which showed you can dramatically increase the capabilities of language models by training them on 5X data, this paper shows you can augment an LM with a nice edit function, and this doesn't come at a loss anywhere else. In fact, OpenAI shows that these "models are strictly more capable than canonically trained left-to-right models, at least within the bounds of the evaluations we consider". 
   Read more: Efficient Training of Language Models to Fill in the Middle (arXiv)


####################################################

Google uses hybrid AI to improve its own code:
…ML + semantic engines = useful capability…

Google has combined machine learning and a rule-based semantic engine to train a Transformer-based system to do code completion on Google's internal codebase. Google looked at how 10,000 Googlers used this capability over the course of three months and the results are quite promising: Google saw a 6% reduction in coding iteration time (switching between builds and tests) and a 7% reduction in context switches (leaving the IDE). "Currently, 3% of new code (measured in characters) is now generated from accepting ML completion suggestions," Google writes.

What they did: Google trained a a transformer running on TPUs on code in Google's monorepo, using a context of between ~1000 and ~2000 tokens. The company trained a single model on a mix of 8 languages (C++, Java, Python, Go, Typescript, Proto, Kotlin, and Dart), and trained a relatively small model (0.5 billion parameters) to allow for fast inference. 
   "The model strongly benefits from the quality of the monorepo, which is enforced by guidelines and reviews," Google writes. 

Why this matters: This is another example of an 'AI flywheel' - Google is using its own code to train models to help its engineers more efficiently write better code, and it is using a (human-run, for now) acceptance process to maintain the quality of the underlying monorepo, so it can avoid pathological degradations due to garbage in/garbage out dynamics. This is also an area where 'economy of code scale' seems to matter - since Google famously has a single, gigantic internal monorepo, it's easier for the company to train a single model on it. 
   Read more: ML-Enhanced Code Completion Improves Developer Productivity (Google AI Blog).


####################################################

Huawei builds its own GitHub Copilot: PanGu-Coder:

…Another illustration of the 'fast follower' nature of Chinese labs…
Researchers with Huawei (specifically, the Noah's Ark Lab, and Huawei Cloud), have built 'PanGu-Coder', a code completion model. PanGu-Coder is to PanGu as OpenAI's Codex is to GPT3 - think of it as a follow-up model using a similar training procedure, albeit on a different data distribution. And, much like PanGu, PanGu-Coder has been published about a year after the public launch of Codex (and GitHub Copilot), illustrating the surprisingly fast rate at which Chinese labs are able to replace large-scale models. 

What PanGu-Coder is: PanGu-Coder is a family of code models for code completion, varying in parameter size from 317million to 2.6 billion. In tests, Huawei claims PanGu-Coder does better than AlphaCode and GitHub Codex on a few human evaluations (though Salesforce's 'Codegen' model does quite well, also). Huawei also significantly improved the capabilities of PanGu-Coder by training a model called PanGu-Coder-FT, which is finetuned on a highly curated dataset. 

Why this matters: Code models, much like language models, are becoming like an all-purpose swiss army knife for a range of AI capability and alignment research. It's notable to me that Huawei has - again - managed to do a decent-looking replication of a frontier model developed by a Western lab. It's also notable that few universities have made attempts to replicate these models, due to the resources (both computational and in terms of technical skill) required.
   Read more:PanGu-Coder: Program Synthesis with Function-Level Language Modeling (arXiv).


####################################################

China releases GLM-130B, a very good language model:
…The world's best public, open source language model is now Made in China…

Researchers with China's Tsinghua University have built and released GLM-130B, a language model that outperforms OPT (Facebook's OS replication of GPT3), BLOOM (HuggingFace's OS replication of GPT3), and OpenAI's original GPT3. This is a pretty big deal, both for the raw capabilities it gives researchers, and for the fact the current best-performing OS language model is Chinese, rather than made in the West. The model was trained on around 400 A100 GPUs which they were able to get via a donation from a local AI startup.

What's special about GLM: GLM outperforms the above-mentioned models, as well as homegrown Chinese models like ERNIE Titan 3.0 (Import AI 279).
   Read more: GLM-130B: An Open Bilingual Pre-Trained Model (Tsinghua).
   Get the model here: GLM-130B (THUDM, GitHub).
   Try the model for yourself: GLM-130B (HuggingFace).

####################################################

Tech Tales:

Micro Religions

During the transition there was a micro religion phase. The recommender systems had figured out just how important community was to people, during that time. So the recommenders started shuffling all the different users of all the different apps towards more and more specific niches. It started with commercial stuff - shoes, different 'aesthetics', watches, different locations to spend time at, different hobbies and so on. But eventually it found its way to theistic beliefs - what is the larger purpose of the world? These beliefs turned out to be fractal-like where the recommenders would find ways to push people into the most specific, narrow existing variations - e.g, traditional catholics versus mormons - but they got through that pretty quickly. Next, the recommenders and the generation systems started to autonomously build entire new belief structures (paired with aesthetic styles that materialized as buyable, wearable merchandise across the full variety of products). They then pushed people towards these, and pretty quickly people - especially young people - started identifying as all these different sub-types of religion. After The Events we all collectively looked back on this time as both quite special (some of the beliefs and aesthetics were tremendously strange and complicated), and also scary (there weren't religious wars, but there were warning signs of building-up inter-micro-religion conflict, though The Events happened shortly after and averted war, while bringing about some of the major changes). 

Things that inspired this story: Intersection of recommendation engines + generative models; large-scale advertising systems. 


Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Twitter
Facebook
Website
Copyright © 2022 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:
Import AI
Many GPUs
Oakland, California 94609

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp

Older messages

Import AI 298: Mimetic models; LLM search engine raises $25m; UK splits from Europe on AI regulation

Monday, July 25, 2022

How many parameters will be tunable in the 'default settings' of a superintelligence? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward

Import AI 297: Ukrainians add object detection to killer drones; YOLOv7; and a $71,000 AI audit competition

Monday, July 18, 2022

In the same way humans have a notion of the aesthetics of abstract sciences, like theoretical physics or advanced mathematics, might computers develop notions of 'beauty' at even higher levels

Import AI 296: $100k to find flaws in LLMs, NVIDIA uses RL to make better chip parts; + 256gb of law data, and a story about the cyber gerontocracy!

Monday, July 11, 2022

Will we ever have 'old growth' computers in the way we have 'old growth' forests today. In the same way there are mainframes that have been running (albeit with parts swapped out and

Import AI 295: DeepMind's baby general agent; NVIDIA simulates a robot factory; AI wars.

Friday, May 20, 2022

If it is possible to develop human-level AI, at one point will we make the first AI magician that perplexes even the most accomplished human magician? View this email in your browser Welcome to Import

Import AI 294: China makes a vast facial recognition dataset; Facebook releases a 30bn parameter model; real world RL

Tuesday, May 10, 2022

If an AI designed the world we lived within, would we know? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI

You Might Also Like

📧 Introduction to Dapr for .NET Developers

Saturday, March 1, 2025

​ Introduction to Dapr for .NET Developers Read on: m​y website / Read time: 10 minutes The .NET Weekly is brought to you by: ​Get every Dometrain Course at 40% off! Dometrain is an educational courses

This Week in Rust #588

Saturday, March 1, 2025

Email isn't displaying correctly? Read this e-mail on the Web This Week in Rust issue 588 — 26 FEB 2025 Hello and welcome to another issue of This Week in Rust! Rust is a programming language

WebAIM February 2025 Newsletter

Friday, February 28, 2025

WebAIM February 2025 Newsletter Read this newsletter online at https://webaim.org/newsletter/2025/february Feature Global Digital Accessibility Salary Survey Results The results of the WebAIM and GAAD

JSK Daily for Feb 28, 2025

Friday, February 28, 2025

JSK Daily for Feb 28, 2025 View this email in your browser A community curated daily e-mail of JavaScript news Introducing the New Angular TextArea Component It is a robust and flexible user interface

Daily Coding Problem: Problem #1704 [Medium]

Friday, February 28, 2025

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Amazon. At a popular bar, each customer has a set of favorite drinks, and will happily

iOS Dev Weekly – Issue 701

Friday, February 28, 2025

What does Dave write about when he has a fever? 🤒 Let's find out! ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Feature | The Best Visualizations from February on Voronoi 🏆

Friday, February 28, 2025

See the most popular, most discussed, and most liked visualizations on our new data storytelling app Voronoi from February. View Online | Subscribe About a year ago, we launched Voronoi, our free new

Issue #582: Phaser Launcher, DOOM in TypeScript types, and A Prison for Dreams

Friday, February 28, 2025

View this email in your browser Issue #582 - February 28th 2025 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to

Stop Android photo surveillance 🔍

Friday, February 28, 2025

Cheaper streaming 📺; 1Password nightmare 💀 -- ZDNET ZDNET Week in Review - US February 28, 2025 machine eye A new Android feature is scanning your photos for 'sensitive content' - how to stop

Why Natural Language Coding Isn’t for Everyone—Yet

Friday, February 28, 2025

Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, February 28, 2025? The