Falcon-180B Takes Open Source LLMs Closer to GPT-4
Was this email forwarded to you? Sign up here Next Week in The Sequence:
Go subscribe!📝 Editorial: Falcon-180B Takes Open Source LLMs Closer to GPT-4A few months ago, The Technology Innovation Institute (TII) in the United Arab Emirates (UAE) took the world of foundation models by storm with the release of the Falcon LLM model. At the time, Falcon was the biggest LLM ever released, with versions of 1B, 7B, and 40B, respectively. The model showed that massively large open-source LLMs that rivaled commercial alternatives such as GPT-4, PaLM2, and Anthropic were a real possibility. Building on the initial success of Falcon, last week, TII open-sourced a new version that showcases an astonishing 180B parameters. Falcon 180B was trained on an unfathomable 3.5 trillion tokens using 4096 GPUs and 7M GPU hours. This effectively represents 2.5 times the size of Llama2 and 4 times the computing power. The released model is fine-tuned on instructional and conversational datasets and definitely represents a completely different level of scale. At over 2.5 times the size of Llama2, Falcon 180B easily topped the open LLM leaderboard, outperforming all other models in tasks such as reasoning, coding proficiency, and knowledge tests. Furthermore, Falcon 180B outperforms GPT-3.5 on different benchmarks, clearly outlining how quickly open source has bridged the gap with closed models. Falcon 180B represents yet another important milestone for the open-source momentum in foundation models. A movement that started with Stable Diffusion and has been actively continued by Llama, Falcon and dozens of other models has sparked a tremendous level of innovation. At this pace, it is not inconceivable to expect open-source models that outperform GPT-4 in the next few months. The momentum in open-source foundation models is real and is not showing any signs of slowing down. 🔎 ML ResearchTSMixerGoogle Research published a paper detailing TSMixer, a long-term forecasting time series model. TXMixer is a multivariate model that leverages linear features to address the requirements of long term forecasts —> Read more. AI CompilersMicrosoft Research published four papers introducing different AI compilers. The paper includes Rammer for parallelism, Roller for computational efficiency, Welder for memory usage and Grinder for hardware acceleration —> Read more. Qwen-VLAlibaba Cloud published a paper introducing Qwen-VL, a set of vision-language model that mastered different tasks across those domains. Specifically, the paper discusses Qwen-VL and Qwen-VL-Chat and their performance in tasks such as zero-shot captioning, visual or document visual question answering, and grounding —> Read more. Frontiers of Multimodal LearningMicrsooft Research published a summary of recent papers detailing their responsible approach to multimodal learning. The research cover aspects such as scaling, risks, scoring methods and other methods relevant in multimodal learning —> Read more. RLAIFGoogle Research published a paper discussing an AI-based alternatives to reinforcement learning with human feedback(RLHF). Called reinforcement learning with AI feedback(RLAIF), the method uses LLMs for labeling the outputs as an alternative to humans —> Read more. 🤖 Cool AI Tech ReleasesFalcon 180BThe new version of the Falcon LLM has been released easily topping the open LLM leaderboard —> Read more. IBM GraniteIBM announced Granite, a new series of foundation models for the WatsonX platform —> Read more. 🛠 Real World MLML at PinterestThe Pinterest engineering team discusses MLEnv, their standarized engine for ML workloads —> Read more. Walmart’s ML PlatformWalmart Global Tech provides details about Element ML, its internal ML platform —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
🔥Building Plaid’s ML Fraud Detection Application—an apply() Fireside Chat
Friday, September 8, 2023
Want to know how Plaid, a leading fintech company, built the ML infrastructure that powers Signal, its payment fraud detection and prevention application? Then join us on September 20 at 9:30 AM PT for
Edge 324: A Deep Dive Into Code Llama: Meta AI’s Open Source Entrance in the Code LLM Space
Thursday, September 7, 2023
The new model builds on Llama2 and includes Python and Instruction following specialized versions.
Edge 323: Types of Memory-Augmentation in Foundation Models
Tuesday, September 5, 2023
Not all LLMs memories are created equal.
The Other OpenAI Competitor that Just Raised a Lot of Money
Sunday, September 3, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
Edge 322: Inside Generative Agents : How Google and Stanford Researchers Used Generative AI to Learn to Simulate H…
Thursday, August 31, 2023
One of the most groundbreaking papers of the last year showed the emergence of human behavior such as social constructs in a game simulated environment.
You Might Also Like
Weekend Reading — More time to write
Sunday, November 24, 2024
More Time to Write A fully functional clock that ticks backwards, giving you more time to write. Tech Stuff Martijn Faassen (FWIW I don't know how to use any debugger other than console.log) People
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video
Daily Coding Problem: Problem #1617 [Easy]
Saturday, November 23, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. You are given an string representing the initial conditions of some dominoes.
Ranked | The Tallest and Shortest Countries, by Average Height 📏
Saturday, November 23, 2024
These two maps compare the world's tallest countries, and the world's shortest countries, by average height. View Online | Subscribe | Download Our App TIME IS RUNNING OUT There's just 3
⚙️ Your own Personal AI Agent, for Everything
Saturday, November 23, 2024
November 23, 2024 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Convergence. Imagine if you had a digital
Educational Byte: Are Privacy Coins Like Monero and Zcash Legal?
Saturday, November 23, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 23, 2024? The HackerNoon
🐍 New Python tutorials on Real Python
Saturday, November 23, 2024
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Black Friday Giveaway @ Real Python This Black
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for