The Sequence Chat: Emad Mostaque -Stability AI, Schelling AI- About Open and Decentralized AI
Was this email forwarded to you? Sign up here The Sequence Chat: Emad Mostaque -Stability AI, Schelling AI- About Open and Decentralized AIThe co-founder and former CEO of Stability AI discusses his new vision for decentralized AI and his new project.Bio:Emad Mostaque is widely recognized as one of the leaders in the open-source generative AI movement. He is the former CEO of Stability AI, the company behind Stable Diffusion and numerous open-source generative AI models across different modalities. Stability AI attracted a community of hundreds of thousands of AI researchers and is actively pushing the boundaries of innovation in the field. Emad holds both a BA and an MA in mathematics and computer science from the University of Oxford, followed by a successful career as a hedge fund manager. After leaving Stability AI, Emad decided to focus on the potential of decentralization. His new project, Schelling AI, combines generative AI and Web3 to enable transparency and trust in the world of foundation models. 🛠 ML Work
It was clear when setting Stability AI up that absent a catalyst the foundation model space would be dominated by a few very large players. In scaling the company we adopted a similar corporate and structural model to these large players and we can see the arms race that ensues, with giant rounds, pressure on hiring, involvement with global regulators and more. The landscape has changed over the last few years and its clear that distributed and decentralised AI has its place, for privacy, security, governance and ensuring the benefits of this technology are Distributed AI is important because by centralising AI and moving towards AGI, we create black boxes which are understood just in terms of inputs and outputs with no understanding of the inner workings or the reasoning behind the models. However, open-source models means code is transparent and publicly available. A framework where everyone checks on everyone promotes accountability, trust and agency. Everyone has a voice in AI creation and evolution. A large area of importance for decentralisation is in facilitating and distributing the benefits of research. Transparency in research builds trust and ensures security in the open. Imagine the healthcare industries of every nation, each with open models and datasets. By allowing everyone to access all medical literature, it would enable the development of unique models that are representative of each community and nationality. For example, a model specifically tailored to Bulgarian cancer research would have a far greater impact on the Bulgarian healthcare system than a generalised American cancer model. Open source means specialisation. This could be said for every industry, not just healthcare, from finance to education.
While I think web3/crypto perhaps deservedly have a bad reputation, much of the advances made in these areas will be directly applicable to our augmented intelligence future. If we look at Bitcoin, it was the original incentive mechanism for large amounts of highly specialised compute - the total energy usage of the Bitcoin network is 160 TWh versus around 350 TWh for all the global data centres to give you an idea of scale. This provides an example of an incentive and coordination system that did a job and could potentially be used to provide the compute needed for open, global models for all, owned and governed as widely as possible. Bitcoin is sometimes also noted as a Schelling Point, a focal point in game theory terms that enables coordination without communication. I think that our future AI systems comprising of billions of robots and trillions of agents will need coordination systems that build from this, from payments (agents are unlikely to have bank accounts) to verification of inputs and outputs and more. These features echo the capabilities being built into second and third generation distributed ledgers, but I don’t think any of these systems are up to the task of coordinating and supporting AI in health, education, government or any important and regulated sector. I think we have a real opportunity to design and build an open, distributed and resilient AI system for all incorporating learnings from across the board. If we can do this an a way that is verifiable and trustworthy then not only will we have solved many issues that plague web3, where the key is trust, not decentralisation for decentralisation’s sake, but more importantly we will perhaps solve many of the issues that plague our existing systems but first integrating with them and then reimagine them.
The first wave of AI models were based around scale, with relatively poor data being eaten by giant supercomputers that papered over their low quality and achieved remarkable results. We are now seeing the important of the data put into models, with high quality models beating larger ones on a fraction of the data and splits in performance based on data. Decentralised AI training of full models will always lag behind centralised clusters due to communication overhead. However, if base models are trained on these massive clusters, say as LLaMA was, then these base models can be taken and customised and improved on a fraction of the compute. We have seen this with the explosion of fine tunes and their combination and recombination from the community to outperform the base models. Decentralisation is also highly suitable for data augmentation, particularly asynchronous, model tuning optimisation and many other use areas. However, the mission isn’t really to decentralise, but to distribute this technology to drive genuinely useful use cases. I think what it will eventually be is a few large players providing base models as infrastructure and swarms of people and then agents optimising the models and underlying data versus training from scratch in swarms.
I think this is similar to highly specialised experts, the models the large organisations are trying to build and the team of talented juniors you bring in, which are similar to these smaller models being run locally and on the edge. There is a concept to satisficing where you reach a level that is good enough and I think small language models have achieved that for many use cases, outperforming giant models from a generation or two ago yet working on a smartphone or laptop. We have seen from Gemma 27b and other models that you can also use large models to instruct and improve smaller models, something done by Meta for the smaller LLaMA models too. I think the final space will incorporate all of these varaiations and there won’t be just type of the model out there.
Closed models will always outperform open models as you can just take an open model and add private data to it (!). Open has an advantage in spread, optimisation and similar areas over directed centralised models. I think ultimately they are complementary though, equivalent to hiring your own graduates versus bringing in consultants. It is likely that open will end up being most used if it can keep up in performance terms. Even if it lags somewhat, models are rapidly becoming “good enough” to build around and the next leg of growth is likely to be on products and services to provide and implement this technology as a result.
I think we perhaps catalysed large amounts of funding to go into open source AI at Stability AI (!) While I think exponential compute is likely not needed for the next generation of models, lessons from crypto capital formation from a funding and distribution perspective are instructive. As noted in question 2 above, Bitcoin has a been a spectacular success in attracting and rewarding specialised compute and energy provision. It has many other issues, but has become institutional and provides some insight into how incentive systems may be employed to provide the compute and funding we need to create genuine public AI infrastructure. Our healthcare, education, government and similar AI systems should not run on black boxes and should not be controlled by an unelected few. Creating a mechanism to provide the compute and capital needed to build and maintain this infrastructure, which government initiatives are clearly likely not to be able to keep up with, is imperative. It is difficult to see how to do this outside of building a new type of organisations based on prior lessons.
Digging into this area I have moved away from decentralised AI towards thinking distributed AI is where it is at, particularly for the implantation of AI technology to important areas of society like health and education. I am somewhat disillusioned by web3/crypto projects forgetting that the core mission is to build systems that can coordinate in a trust minimised fashion for real world use cases versus decentralisation for the sake of it and relying on speculation. If we look to the future as outlined in our how to think about AI piece, it is clear that generative AI has a role to play in the future of many areas of the public and private sector. While this needs to be built on new coordination and alignment infrastructure, it is unclear whether any of the systems we have today are suitable for this. Where projects today are good is in supply aggregation (eg DePIN for distributed compute), research on governance with DAOs having made all the mistakes of democracy and more and payments, which will be essential as the number of agents and robots increases.
I think you area already seeing models good enough for a range of tasks on the edge and innovative architectures to enable this. I think firming up a baseline of model quality people can build around, much as they continue to build around the original stable diffusion, is very important as this opens up a range of potential mechanism design. This includes distributed tuning and model/data optimisation and ablation capability. I think this is somewhat recursive as well as better base models that are predictable and improve can continually support data location and improvement that then, in turn can make the models better. What data and knowledge should go into a model pre training and post training is probable the most of important outstanding research question. If we can figure this out then we can pull on not only the compute and support of the masses, but their expertise to increase the quality and diversity of the data that feeds our models and their ability to help us all. 💥 Miscellaneous – a set of rapid-fire questions
I have a particularly interest in neurochemistry from my ASD research and functional medicine, which I think will be completely transformed by AI.
I think AGI from scaling LLMs is unlikely. What we are seeing now is similar to cooking a poor quality steak (massive datasets) for longer. They get tender and nice and the system exhibits increasing capability, but not necessarily generalised knowledge as an individual model nor capability. When put together in a broader system this does, of course, become more difficult to predict. It could be that humans plus sufficiently advanced generative AI systems are the real ASI. Especially when we get BCI kicking off.
An open, distributed, AI system that offers universal basic intelligence to everyone, is communally owned and governed and constantly improving with the objective function of human flourishing.
I have a particular soft spot for Claude Shannon whose wonderful work laid the foundation for these massive advances we have seen. Herb Simon is another favourite bridging multiple disciplines and has been an inspiration for the design of Schelling AI in particular. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 422: How NuminaMath Won the AI Math Olympiad?
Tuesday, August 20, 2024
The model combines a novel neurosymbolic architecture with a unique training mechanism. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
📽 [Webinar] Cut storage and processing costs for vector embeddings
Tuesday, August 20, 2024
Innovative leaders such as NielsenIQ are increasingly turning to a data lakehouse approach to power their Generative AI initiatives amidst rising vector database costs. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Finish signing in to TheSequence
Tuesday, August 20, 2024
Here's a link to sign in to TheSequence. This link can only be used once and expires in one hour. If expired, please try signing in again here. Sign in now © 2024 Jesus Rodriguez 75 Miracle Mile,
Finish signing in to TheSequence
Tuesday, August 20, 2024
Here's a link to sign in to TheSequence. This link can only be used once and expires in one hour. If expired, please try signing in again here. Sign in now © 2024 Jesus Rodriguez 75 Miracle Mile,
The AI Scientist
Tuesday, August 20, 2024
A model that can produce novel AI papers plus some really cool papers and tech releases this week. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Iran's Charming Kitten Deploys BellaCPP: A New C++ Variant of BellaCiao Malware
Wednesday, December 25, 2024
THN Daily Updates Newsletter cover The Data Science Handbook, 2nd Edition ($60.00 Value) FREE for a Limited Time Practical, accessible guide to becoming a data scientist, updated to include the latest
Software Testing Weekly - Issue 251
Wednesday, December 25, 2024
GitHub Copilot is free! 🤖 View on the Web Archives ISSUE 251 December 25th 2024 COMMENT Welcome to the 251st issue! In case you missed it — GitHub Copilot is free! The free version works with Visual
Daily Coding Problem: Problem #1647 [Medium]
Tuesday, December 24, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Square. In front of you is a row of N coins, with values v 1 , v 1 , ..., v n . You are
Sentiment Analysis, Topological Sort, Web Security, and More
Tuesday, December 24, 2024
Exploring Modern Sentiment Analysis Approaches in Python #661 – DECEMBER 24, 2024 VIEW IN BROWSER The PyCoder's Weekly Logo Exploring Modern Sentiment Analysis Approaches in Python What are the
🤫 Do Not Disturb Mode Is My Secret to Sanity — 8 Gadgets I Want To See Nintendo Make
Tuesday, December 24, 2024
Also: The Best Christmas Movies to Watch on Netflix, and More! How-To Geek Logo December 24, 2024 Did You Know Their association with the Christmas season might make you think poinsettias hail from a
😱 AzureEdge.net DNS Retiring Jan. 2025, 🚀 Microsoft Phi-4 AI Outperforms, 🔒 Microsoft Secure Future Initiative
Tuesday, December 24, 2024
Blog | Advertise | View Online Your trusted source for Cloud, AI and DevOps guidance with industry expert Chris Pietschmann! Phi-4: Microsoft's New Small Language Model Outperforms Giants in AI
Mapped | The Top Health Insurance Companies by State 🏥
Tuesday, December 24, 2024
In 13 US states, a single company dominates the health insurance market, holding at least half of the total market share. View Online | Subscribe | Download Our App Presented by: Global X ETFs Power
The Stanford Grad Who Forgot How To Think
Tuesday, December 24, 2024
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, December 24, 2024? The
The next big HDMI leap is coming
Tuesday, December 24, 2024
Sora side hustles; Casio's tiny watch comes to the US -- ZDNET ZDNET Tech Today - US December 24, 2024 Ecovacs Deebot T30S Combo robot vacuum and mop The next big HDMI leap is coming next month -
⚙️ Robo-suits
Tuesday, December 24, 2024
Plus: The data center energy surge