Import AI 299: The world's best language model is Made in China; NVIDIA boosts LLM training; OpenAI shows how to 'fill in the middle' on a LM

What will be the first new thing a superintelligence will invent?
View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Want a 30% boost to training LLMs? Use the Nvidia Megatron update:
…Two new techniques lead to big savings…
NVIDIA has updated Nemo Megatron, software for training large language models. The updates - sequence parallelism (SP) and selective activation recomputation (SAR) - makes training large-scale neural networks significantly more efficient. 

   "The latest updates to NeMo Megatron offer 30% speed-ups for training GPT-3 models ranging in size from 22 billion to 1 trillion parameters. Training can now be done on 175 billion-parameter models using 1,024 NVIDIA A100 GPUs in just 24 days–reducing time to results by 10 days, or some 250,000 hours of GPU computing, prior to these new releases," NVIDIA writes. 

Why this matters: By integrating basic improvements into training frameworks, NVIDIA is going to generate a large-scale impact on anyone who uses the Megatron framework. This illustrates how AI progress sometimes operates like a one-way ratchet - someone implements some changes in some increasingly widely used software, and efficiency jumps upward for all the users overnight.
   Read more: NVIDIA AI Platform Delivers Big Gains for Large Language Models (NVIDIA blog).

####################################################

Want to make a language model with a 'fill in the middle' option? Here's how!
…Sentence completion is cool, but infilling is useful as well…
Here's a straightforward paper from OpenAI that describes how to give language models the ability to learn to infill text - e.g, taking a sentence and knocking out the middle of it and asking the model to 'fill in the middle'. 

The big insight: The main insight here is that you can learn to fill in the middle "without compromising the left-to-right capability in pretraining…FIM models achieve the same test loss as AR models on left-to-right test loss while achieving lower FIM loss.". They also learn that it's inefficient to finetune a model to learn to fill in the middle, and you should generally do it at the pretraining stage instead. 

Why this matters: Somewhat like DeepMind's recent 'Chinchilla' paper (Import AI #290), which showed you can dramatically increase the capabilities of language models by training them on 5X data, this paper shows you can augment an LM with a nice edit function, and this doesn't come at a loss anywhere else. In fact, OpenAI shows that these "models are strictly more capable than canonically trained left-to-right models, at least within the bounds of the evaluations we consider". 
   Read more: Efficient Training of Language Models to Fill in the Middle (arXiv)


####################################################

Google uses hybrid AI to improve its own code:
…ML + semantic engines = useful capability…

Google has combined machine learning and a rule-based semantic engine to train a Transformer-based system to do code completion on Google's internal codebase. Google looked at how 10,000 Googlers used this capability over the course of three months and the results are quite promising: Google saw a 6% reduction in coding iteration time (switching between builds and tests) and a 7% reduction in context switches (leaving the IDE). "Currently, 3% of new code (measured in characters) is now generated from accepting ML completion suggestions," Google writes.

What they did: Google trained a a transformer running on TPUs on code in Google's monorepo, using a context of between ~1000 and ~2000 tokens. The company trained a single model on a mix of 8 languages (C++, Java, Python, Go, Typescript, Proto, Kotlin, and Dart), and trained a relatively small model (0.5 billion parameters) to allow for fast inference. 
   "The model strongly benefits from the quality of the monorepo, which is enforced by guidelines and reviews," Google writes. 

Why this matters: This is another example of an 'AI flywheel' - Google is using its own code to train models to help its engineers more efficiently write better code, and it is using a (human-run, for now) acceptance process to maintain the quality of the underlying monorepo, so it can avoid pathological degradations due to garbage in/garbage out dynamics. This is also an area where 'economy of code scale' seems to matter - since Google famously has a single, gigantic internal monorepo, it's easier for the company to train a single model on it. 
   Read more: ML-Enhanced Code Completion Improves Developer Productivity (Google AI Blog).


####################################################

Huawei builds its own GitHub Copilot: PanGu-Coder:

…Another illustration of the 'fast follower' nature of Chinese labs…
Researchers with Huawei (specifically, the Noah's Ark Lab, and Huawei Cloud), have built 'PanGu-Coder', a code completion model. PanGu-Coder is to PanGu as OpenAI's Codex is to GPT3 - think of it as a follow-up model using a similar training procedure, albeit on a different data distribution. And, much like PanGu, PanGu-Coder has been published about a year after the public launch of Codex (and GitHub Copilot), illustrating the surprisingly fast rate at which Chinese labs are able to replace large-scale models. 

What PanGu-Coder is: PanGu-Coder is a family of code models for code completion, varying in parameter size from 317million to 2.6 billion. In tests, Huawei claims PanGu-Coder does better than AlphaCode and GitHub Codex on a few human evaluations (though Salesforce's 'Codegen' model does quite well, also). Huawei also significantly improved the capabilities of PanGu-Coder by training a model called PanGu-Coder-FT, which is finetuned on a highly curated dataset. 

Why this matters: Code models, much like language models, are becoming like an all-purpose swiss army knife for a range of AI capability and alignment research. It's notable to me that Huawei has - again - managed to do a decent-looking replication of a frontier model developed by a Western lab. It's also notable that few universities have made attempts to replicate these models, due to the resources (both computational and in terms of technical skill) required.
   Read more:PanGu-Coder: Program Synthesis with Function-Level Language Modeling (arXiv).


####################################################

China releases GLM-130B, a very good language model:
…The world's best public, open source language model is now Made in China…

Researchers with China's Tsinghua University have built and released GLM-130B, a language model that outperforms OPT (Facebook's OS replication of GPT3), BLOOM (HuggingFace's OS replication of GPT3), and OpenAI's original GPT3. This is a pretty big deal, both for the raw capabilities it gives researchers, and for the fact the current best-performing OS language model is Chinese, rather than made in the West. The model was trained on around 400 A100 GPUs which they were able to get via a donation from a local AI startup.

What's special about GLM: GLM outperforms the above-mentioned models, as well as homegrown Chinese models like ERNIE Titan 3.0 (Import AI 279).
   Read more: GLM-130B: An Open Bilingual Pre-Trained Model (Tsinghua).
   Get the model here: GLM-130B (THUDM, GitHub).
   Try the model for yourself: GLM-130B (HuggingFace).

####################################################

Tech Tales:

Micro Religions

During the transition there was a micro religion phase. The recommender systems had figured out just how important community was to people, during that time. So the recommenders started shuffling all the different users of all the different apps towards more and more specific niches. It started with commercial stuff - shoes, different 'aesthetics', watches, different locations to spend time at, different hobbies and so on. But eventually it found its way to theistic beliefs - what is the larger purpose of the world? These beliefs turned out to be fractal-like where the recommenders would find ways to push people into the most specific, narrow existing variations - e.g, traditional catholics versus mormons - but they got through that pretty quickly. Next, the recommenders and the generation systems started to autonomously build entire new belief structures (paired with aesthetic styles that materialized as buyable, wearable merchandise across the full variety of products). They then pushed people towards these, and pretty quickly people - especially young people - started identifying as all these different sub-types of religion. After The Events we all collectively looked back on this time as both quite special (some of the beliefs and aesthetics were tremendously strange and complicated), and also scary (there weren't religious wars, but there were warning signs of building-up inter-micro-religion conflict, though The Events happened shortly after and averted war, while bringing about some of the major changes). 

Things that inspired this story: Intersection of recommendation engines + generative models; large-scale advertising systems. 


Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Twitter
Facebook
Website
Copyright © 2022 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:
Import AI
Many GPUs
Oakland, California 94609

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp

Older messages

Import AI 298: Mimetic models; LLM search engine raises $25m; UK splits from Europe on AI regulation

Monday, July 25, 2022

How many parameters will be tunable in the 'default settings' of a superintelligence? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward

Import AI 297: Ukrainians add object detection to killer drones; YOLOv7; and a $71,000 AI audit competition

Monday, July 18, 2022

In the same way humans have a notion of the aesthetics of abstract sciences, like theoretical physics or advanced mathematics, might computers develop notions of 'beauty' at even higher levels

Import AI 296: $100k to find flaws in LLMs, NVIDIA uses RL to make better chip parts; + 256gb of law data, and a story about the cyber gerontocracy!

Monday, July 11, 2022

Will we ever have 'old growth' computers in the way we have 'old growth' forests today. In the same way there are mainframes that have been running (albeit with parts swapped out and

Import AI 295: DeepMind's baby general agent; NVIDIA simulates a robot factory; AI wars.

Friday, May 20, 2022

If it is possible to develop human-level AI, at one point will we make the first AI magician that perplexes even the most accomplished human magician? View this email in your browser Welcome to Import

Import AI 294: China makes a vast facial recognition dataset; Facebook releases a 30bn parameter model; real world RL

Tuesday, May 10, 2022

If an AI designed the world we lived within, would we know? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI

You Might Also Like

Sunday Digest | Featuring 'The World’s 20 Largest Economies, by GDP (PPP)' 📊

Sunday, December 22, 2024

Every visualization published this week, in one place. Dec 22, 2024 | View Online | Subscribe | VC+ | Download Our App Hello, welcome to your Sunday Digest. This week, we visualized public debt by

Android Weekly #654 🤖

Sunday, December 22, 2024

View in web browser 654 December 22nd, 2024 Articles & Tutorials Sponsored Solving ANRs with OpenTelemetry While OpenTelemetry is the new observability standard, it lacks official support for many

😸 Our interview with Amjad Masad

Sunday, December 22, 2024

Welcome back, builders Product Hunt Sunday, Dec 22 The Roundup This newsletter was brought to you by AssemblyAI Welcome back, builders Happy Sunday! We've got a special edition of the Roundup this

C#537 Automating Santa's Workshop with NServiceBus

Sunday, December 22, 2024

Using event-driven architecture for effective gift delivery 🎄🎁 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

The Race for AI Reasoning is Challenging our Imagination

Sunday, December 22, 2024

New reasoning models from Google and OpenAI ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

ScienceDaily/Minimalist lamp/Avocado tip

Sunday, December 22, 2024

Recomendo - issue #442 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Laravel VS Code Extension, Laravel 11.36, Wirechat, and more! - №544

Sunday, December 22, 2024

Your Laravel week in review ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Kotlin Weekly #438

Sunday, December 22, 2024

ISSUE #438 22nd of December 2024 Announcements klibs.io JetBrains has introduced the alpha version of klibs.io – a web service that speeds up and simplifies discovering KMP libraries that best meet

Weekend Reading — Happy "That's a January Problem" week

Saturday, December 21, 2024

Can Christmas season start a little earlier this year Tech Stuff Ramsey Nasser fuck it happened i am in a situation where i do actually need to reverse a linked list Atuin I just learned about Atuin

Daily Coding Problem: Problem #1644 [Easy]

Saturday, December 21, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by IBM. Given an integer, find the next permutation of it in absolute order. For example,