Inside BLOOM: How Thousands of AI Researchers Created an Open Source ChatGPT Alternative
Was this email forwarded to you? Sign up here. You can also give it as a gift. Inside BLOOM: How Thousands of AI Researchers Created an Open Source ChatGPT AlternativeAn open-source LLM shows that tech incumbents are not the only companies able to create massive models.When we think about large language models(LLMs) alternatives to ChatGPT, we tend to think about projects from large AI labs or ultra-well-financed startups. But what happens when a large number of AI researchers decide to collaborate to make LLMs available to mainstream researchers? The result is BLOOM, an open source 176 billion parameters LLMs that is able to master tasks in 46 languages and 13 programming languages. The development of BLOOM was coordinated by BigScience, a vibrant open research collaboration with a mission to publicly release an LLM. The project was brought to life after being awarded a computing grant by GENCI on its Jean Zay supercomputer at IDRIS/CNRS. The project was founded by Hugging Face and the French NLP community and soon attracted a diverse international collaboration with a goal to support linguistic, geographical, and scientific diversity. Over 1200 participants from 38 countries, including experts in machine learning, computer science, linguistics, statistics, socio-cultural anthropology, philosophy, law, and other fields, registered with BigScience and were given access to its communication channels. The organization of the BigScience effort, reflecting the related research questions tackled, was structured into 30 working groups, each comprising several participants with various levels of involvement and chairs tasked with self-organizing around specific aspects of the overall project. Participants were encouraged to join multiple working groups to share experiences and information, leading to a dynamic and collaborative environment. The majority of the working groups focused on tasks directly linked to the development of BLOOM. BLOOMThe BLOOM architecture is based on causal-decoder transformer models. This type of architecture is pretty standard for LLMs above 100B parameters as they have shown the best performance. Beyond the architecture of choice, BLOOM introduced a couple of key innovations to standard causal-decoder models. I. ALiBi Positional Embeddings: Instead of adding positional information to the embedding layer, ALiBi leverages a unique approach by directly attenuating the attention scores based on the distance between the keys and queries. The initial motivation for ALiBi was its ability to extrapolate to longer sequences, but the researchers were thrilled to discover that it also led to smoother training and improved downstream performance, even outperforming both learned and rotary embeddings. II. Embedding LayerNorm: In preliminary experiments on a massive 104B parameters model, the team experimented with an additional layer normalization immediately after the embedding layer. This significantly improved training stability. Nonetheless, Bigscience decided to train BLOOM with an additional layer normalization after the first embedding layer to avoid training instabilities. It is worth noting that the preliminary experiments were conducted in float16, while the final training was in bfloat16. Since then, float16 has been identified as the cause of many observed instabilities in training LLMs, and it is possible that bfloat16 alleviates the need for the embedding LayerNorm. BLOOM was trained on the ROOTS corpus, which includes 498 Hugging Face datasets that cover 46 languages and 3 programming languages. The training process includes data sourcing and processing stages. From the infrastructure standpoint, BLOOM was brought to life through the power of Megatron-DeepSpeed20, a cutting-edge framework for large-scale distributed training. This framework is a dynamic fusion of two parts: Megatron-LM21, which provides the Transformer implementation, tensor parallelism, and data loading primitives, and DeepSpeed22, which brings the ZeRO optimizer, model pipelining, and generally distributed training components to the table. The Megatron-DeepSpeed20 framework allows for efficient training with 3D parallelism, a combination of three complementary approaches to distributed deep learning. These approaches are: · Data parallelism (DP): This approach replicates the model multiple times and places each replica on a different device, where it is fed a slice of the data. The processing is done in parallel, and all model replicas are synchronized at the end of each training step. · Tensor parallelism (TP): This approach partitions individual layers of the model across multiple devices. This way, instead of having the whole activation or gradient tensor reside on a single GPU, shards of this tensor are placed on separate GPUs, making it possible to perform horizontal parallelism or intra-layer model parallelism. · Pipeline parallelism (PP): This approach splits up the model’s layers across multiple GPUs so that only a fraction of the layers of the model are placed on each GPU. This technique is sometimes called vertical parallelism. Finally, the Zero Redundancy Optimizer (ZeRO) allows different processes to only hold a fraction of the data (parameters, gradients, and optimizer states) required for a training step. The team used ZeRO stage 1, meaning that only the optimizer states are shared in this manner. With the combination of these four components, BLOOM was able to scale to hundreds of GPUs with extremely high GPU utilization. The team was thrilled to achieve 156 TFLOPs in their fastest configuration with A100 GPUs, hitting their objective with flying colors! A Special LLMBLOOM is a very special model in the LLM space. It shows that LLMs are not an exclusive domain of large AI labs, and when a large community of AI researchers comes together, magical things can happen! TheSequence is a summary of groundbreaking ML research papers, engaging explanations of ML concepts, and exploration of new ML frameworks and platforms. We keep you up-to-date with the main AI news, trends, and technology developments. This post is only for paying subscribers of TheSequence Edge. You can give it as a gift. |
Older messages
💡TOMORROW: Chip Huyen & Kevin Stumpf on Making the Jump to Real-Time ML
Wednesday, February 22, 2023
Real-time ML is increasingly being adopted to power new applications across use cases in multiple industries. But for most companies, moving to real-time ML is a huge undertaking. It requires a shift
Who Has The Vision?
Sunday, February 12, 2023
On Sunday, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space
Edge 267: A Summary of our Machine Learning Interpretability Series
Tuesday, February 7, 2023
11 issues that cover the fundamental topics in machine learning interpretability.
The ChatGPT Challengers
Sunday, February 5, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
💡Share Your Thoughts on Applied ML for a $25 Amazon Gift Card*
Friday, February 3, 2023
As a member of the ML community, we'd love for you to participate in our industry survey—it'll only take 10 minutes, and the first 150 respondents will receive a $25 Amazon gift card! Your
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your