🎙 Brian Venturo/CoreWeave about GPU-first ML infrastructures
Was this email forwarded to you? Sign up here 🎙 Brian Venturo/CoreWeave about GPU-first ML infrastructuresHow cryptocurrency mining led the team to challenge “big 3” cloud providersIt’s inspiring to learn from practitioners. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight and inspiration. Share this interview if you find it enriching. No subscription is needed. 👤 Quick bio / Brian Venturo
Brian Venturo (BV): I’m Brian Venturo, Co-Founder and CTO of CoreWeave. Prior to CoreWeave, I spent over a decade building and running hedge funds focused on energy markets. In 2016, Mike Intrator (CEO), Brannin McBee (CSO), and I bought our first GPU and began experimenting with cryptocurrency mining. Over the next few years, as a hobby became our sole business focus, we built a large-scale infrastructure spanning seven facilities and inched closer towards our goal of building a cloud infrastructure that provided the world’s creators and innovators access to scalable infrastructure with approachable prices – something that the industry was largely missing. Machine learning and batch processing were the first high-performance computing use cases we served, and something that we wanted to support at scale as we continued building CoreWeave Cloud. I’m happy to announce that we just raised $50 million (some news for your Sunday Scope!) to accelerate the growth of the business. 🛠 ML Work
BV: When we began building CoreWeave Cloud, we set out to help empower engineers and creators to access compute on-demand at a massive scale for GPU accelerated use cases. We were all too familiar with the inflexibility and high cost of compute on legacy cloud providers and believed that we could help our clients create world-changing technology more effectively by removing barriers to scale. Machine learning and batch processing are classic examples of this – we're consistently blown away by what our clients can do when they’re able to train, iterate, fine-tune, serve models, and analyze data faster. The challenges that our clients face with the “big 3” cloud providers can be summarized across three themes:
BV: This is top of mind as we finished building our state-of-the-art NVIDIA A100 distributed training cluster this year. Our partners at Eleuther AI are currently using it to train GPT-NeoX-20B, which we expect to be the largest open-source language model when it’s completed later this year. Training – at any scale – is complex from a technical perspective, and for that reason, we feel that it’s really important to provide clients with options. A few examples include:
Possibly even more so than training, hardware selection has a huge impact on inference workloads, as performance-adjusted cost benchmarking becomes critically important for our clients serving models at scale. We recently released benchmarks across five GPU types for our managed inference service for Eleuther AI’s GPT-J-6B model.
BV: My personal view is that the training market will become more fragmented from the model serving side over the next few years. I think we’re going to see a few large groups, whether they are private institutions or crowdsourced groups, training mega-scale models with a goal of either selling them to a large public cloud under a monopolistic arrangement or open-sourcing them for the world at large to use. I have concerns about the large cloud providers attempting to corner certain portions of the market with proprietary hardware for specific use cases and models that they own. For open-source models, I expect there to be a lot of smaller groups that need limited amounts of compute to fine-tune the models, but the largest demand is going to be for flexible compute to serve these models at scale. If I were to make a bet, it’s that flexible compute will continue to dominate the landscape given that it’s easier to source, use broadly, and build engineering teams to support it.
BV: We have countless conversations with clients who are looking to optimize for cost but haven’t optimized their models to fit in more economical GPUs. Sometimes, the team behind a project may be so overwhelmed that they can’t focus the time, which is where we collect data to inform our product roadmap of how we can be more helpful to clients in the future. It’s impossible for us to optimize every model serving pipeline, but I think there is an opportunity for us to create tools for clients to get a better “bang for their buck” at scale. We also see a lot of movement in this area from the framework developers. For example, TorchScript brought PyTorch up to the efficient execution of TensorFlow saved models. Models that can be converted to NVIDIA TensorRT often gain substantial improvements in inference times. Clients who are able to invest the time – like AI Dungeon and Novel AI – often see massive improvements in performance-adjusted cost.
BV: Regarding crossing the chasm you described, some teams are already there and looking for a software provider that delivers an out-of-the-box solution, taking care of hardware and infrastructure under the hood. There are a ton of interesting companies providing solutions for MLOps, a space that is absolutely exploding, and one you covered thoughtfully in TheSequence yesterday. I don’t think there’s a “one size fits all” solution here, nor is a potential solution to the problem – to the extent a problem exists – that specific. For larger, complex models, you are always going to want to do some hardware-specific tuning. 💥 Miscellaneous – a set of rapid-fire questions
Easy. Achilles and the Tortoise. Makes my mind shudder.
I am a believer in learning that the water is cold after jumping in. Learning through practice is all I’ve ever known.
Maybe. I do think that the basic imitation game in the Turing test can be overcome by an NLP model at some point in the not too far future. NLP models can already readily have a legible conversation with a human. They are still, however, a supercomputer generating answers based on what it has learned from humans. I do believe we need a deeper, non-language-based test to truly determine if an AI can actually think and draw conclusions on its own. Think something like the story in the movie Ex Machina.
I hope not for Bitcoin’s sake. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
🔥 Edge#139: MLOps – one of the hottest topics in the ML space
Tuesday, November 9, 2021
A new series on TheSequence
➗✖️ OpenAI New NLP Challenge: Mathematical Reasoning
Sunday, November 7, 2021
Weekly news digest curated by the industry insiders
📝 Guest post: How to build SuperData for AI [Full Checklist]*
Friday, November 5, 2021
Read it without a subscription
🏷 Edge#138: Toloka App Services Aims to Make Data Labeling Easier for AI Startups
Thursday, November 4, 2021
New tools on the market
📌 Event: MLOps Cocktails Done Right: How to Mix Data Science, ML Engineering, and DevOps*
Wednesday, November 3, 2021
[FREE Virtual Event]
You Might Also Like
When accelerator dreams become company nightmares
Thursday, May 2, 2024
Plus: Illinois gives Rivian incentives and AI is not SaaS View this email online in your browser By Christine Hall Thursday, May 2, 2024 Hello, and welcome back to TechCrunch PM. We have a great lineup
📱 Issue 409 - Claude Team plan and iOS app
Thursday, May 2, 2024
This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 409 Release Date May 02, 2024 Your weekly report of the most popular iOS news, articles and projects Popular
💻 Issue 415 - Hotel WiFi JavaScript Injection (2012)
Thursday, May 2, 2024
This week's Awesome JavaScript Weekly Read this email on the Web The Awesome JavaScript Weekly Issue » 415 Release Date May 02, 2024 Your weekly report of the most popular JavaScript news, articles
💎 Issue 415 - Choosing the Right Audit Trail Approach in Ruby
Thursday, May 2, 2024
This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 415 Release Date May 02, 2024 Your weekly report of the most popular Ruby news, articles and
💻 Issue 408 - Speeding up C++ build times
Thursday, May 2, 2024
This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 408 Release Date May 02, 2024 Your weekly report of the most popular .NET news, articles and projects
💻 Issue 415 - Ditch dotenv: Node.js Now Natively Supports .env File Loading
Thursday, May 2, 2024
This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 415 Release Date May 02, 2024 Your weekly report of the most popular Node.js news, articles and
💻 Issue 333 - React 19 Beta
Thursday, May 2, 2024
This week's Awesome React Weekly Read this email on the Web The Awesome React Weekly Issue » 333 Release Date May 02, 2024 Your weekly report of the most popular React news, articles and projects
📱 Issue 412 - The Composable Architecture: My 3 Year Experience
Thursday, May 2, 2024
This week's Awesome Swift Weekly Read this email on the Web The Awesome Swift Weekly Issue » 412 Release Date May 02, 2024 Your weekly report of the most popular Swift news, articles and projects
💻 Issue 410 - Lessons learned after 3 years of fulltime Rust game development, and why we're leaving Rust behind
Thursday, May 2, 2024
This week's Awesome Rust Weekly Read this email on the Web The Awesome Rust Weekly Issue » 410 Release Date May 02, 2024 Your weekly report of the most popular Rust news, articles and projects
📺 How to Stop Any Smart TV From Spying On You — Single Player Games That Are Fun With Friends
Thursday, May 2, 2024
Also: Alienware's Latest Gaming Laptop is Great For Work, and More! How-To Geek Logo May 2, 2024 Did You Know The voice actors behind Mickey and Minnie Mouse throughout the 1980s, 1990s, and most