| Introduction | No conversation about Foundation Models (FMs) and Large Language Models (LLMs) is possible without touching on compute. The development and deployment of FM models are compute-intensive endeavors, often requiring thousands of petaflops of processing power. Compute isn't just a resource for these models; it's a catalyst for their capabilities and evolution. That's what made this AI advancement possible, after all. | In today's Token, we'll peel back the layers on the AI chips that power these models and demystify the complex landscape of semiconductors for a better understanding. Merging the technical with the practical, we will trace the trajectory from general-use semiconductors to the specialized chips of today. This is a narrative of evolution, driven by the needs of sophisticated AI, and a guide to the silicon innovations that meet those needs head-on. | In today’s Token: | Compute fundamentals and the evolution of AI chips Market dynamics (including the latest AI chips from main vendors and their smaller competitors; as well as cloud services) How to make compute choices What’s on the horizon (including Forecasting Market Evolutions) Conclusion
| Compute Fundamentals & The Evolution of AI Chips | In the computing world, semiconductors are the linchpin, with each chip acting as a miniature hub of electrical circuits. Traditional semiconductors have powered everything from wristwatches to spacecraft, but as AI's complexity escalates, so does the need for a different kind of semiconductor: the AI chip. | The march of AI chips began with the ubiquitous Central Processing Unit (CPU), a jack-of-all-trades processor that powered early AI tasks with admirable resilience. However, the CPU's sequential processing soon became a bottleneck for the parallelism AI algorithms craved. | The main AI chip you hear about is the GPU, which is not exactly true. GPU – Graphic Processing Unit – was initially created not for AI purposes but to render graphics in video games. Nvidia – a bespoke provider of GPUs – made a strategic advancement to GPUs by introducing tensor cores, elements finely tuned to accelerate AI workloads, allowing models to learn from vast datasets more efficiently than standard CPUs could. | So the GPU became an AI chip via recalibration of capabilities, specifically architected to handle the parallel processing that ML algorithms require. This marked the first significant leap toward specialized AI compute. | Yet, the insatiable compute appetite of emerging AI models necessitated more than a retrofit; it required a ground-up rethink. Thus, were born the TPUs (Tensor Processing Units), Google's answer to the need for lightning-fast matrix computations, and the IPUs (Intelligence Processing Units) by Graphcore, which offer an architecture designed to mimic the massively parallel processing of the human brain. There are also chips specifically engineered for neural network inference, such as those found in Tesla's Full Self-Driving (FSD) suite, but their utility is more specialized and narrow in scope. New promising work is happening on Neuromorphic Chips and Quantum Processing Units (QPUs), as well. | Market Dynamics | The GPU market is currently experiencing a significant shortage, impacting the AI industry profoundly. NVIDIA's GPUs, essential for ML across all stages, especially in pretraining models, are hard to come by, leading to multiyear leases by major tech firms and excluding smaller innovators. This scarcity is prompting shifts in the market, with competition between the main vendors like Nvidia, AMD, and Intel, new AI chips startups, and research around it, thriving industry of cloud service providers, in-house tech development by firms like Apple, and repurposing of other GPU sources. | Let's take a closer look at some of them: | - The latest AI chips from main vendors and their smaller competitors | Nvidia – the undisputed leader of GPU industry with the company’s capitalization close to one trillion dollars – has announced an accelerated release schedule for its AI chips, shifting to annual updates with the H200 in 2024 and the B100 later the same year, following the current H100. The H200 will continue using the Hopper architecture, while the B100 is rumored to employ a new Blackwell architecture. Nvidia will also update the Grace Hopper Superchip and the L40S universal accelerator and introduce a new NVL chip line for AI workloads on Arm-based systems. The company plans for faster networking products, with 400 Gb/s and 800 Gb/s InfiniBand and Ethernet releases set for 2024 and 2025. | Intel has announced their new Core Ultra processors, codenamed Meteor Lake, with the launch on December 14. These processors will feature Intel's first integrated neural processing unit (NPU) for efficient AI acceleration, making AI more accessible on PCs. Core Ultra is the inaugural chiplet design from Intel enabled by Foveros packaging technology, and it combines an NPU, advanced power-efficient performance due to the Intel 4 process technology, and discrete-level graphics capabilities with onboard Intel® Arc™ graphics. The disaggregated architecture offers a balance across AI-driven tasks, with the GPU providing performance for AI in media and 3D applications, the NPU handling low-power AI and AI offload, and the CPU optimized for responsive, low-latency AI tasks. | AMD has announced the upcoming MI300, an artificial intelligence accelerator chip intended for AI training via data-intensive processes. It leverages parallel computing, akin to gaming PC GPUs, to handle multiple workstreams simultaneously, enhancing AI efficiency. Technical specifics are sparse, but its design targets Nvidia's H100 chip market dominance. The MI300 is central to AMD's strategy in the burgeoning AI accelerator market, projected to reach $150 billion by 2027. | The market is also witnessing the rise of trailblazers like Cerebras, whose wafer-scale engine challenges conventional chip designs and aim at supercomputers. There also innovators with neuron-inspired architectures that offer a fascinating sidebar in this narrative. Projects like NuPIC (Numenta's Hierarchical Temporal Memory) and NorthPole (from IBM Research) are breaking the mold, drawing inspiration from the neural circuitry of the human brain to develop chips that could one day process information in fundamentally new ways. | NorthPole is a novel neural inference architecture that integrates computing and memory in a single on-chip system, mirroring the organic brain's efficiency but tailored for silicon. By merging compute with on-chip memory and functioning as active memory, NorthPole circumvents the traditional separation of memory and processor. | Numenta's NuPIC is using its brain-based algorithms, data structures, and architectures to enable the deployment of LLMs efficiently on more accessible CPUs, offering a blend of performance, cost savings, and data privacy. NuPIC ensures on-premise data control for enhanced security and compliance, supports a range of LLMs for customization, and allows rapid prototyping to full-scale deployment with ease. | - Cloud Services | Other important players on the field of AI compute are Cloud service providers, that offer compute resources. They act as a force multiplier for AI development. The list is not exhaustive but reflects the main players on the market (Turing Post has no affiliation with any of these companies): | Cloud Services Other important players on the field of AI compute are Cloud service providers, that offer compute resources. They act as a force multiplier for AI development. | The list is not exhaustive but reflects the main players on the market (Turing Post has no affiliation with any of these companies): | | The demand is so high that some investors organize their ‘local’ AI cloud services, such as Andromeda Cluster by Nat Friedman, ex-CEO of GitHub, and investor Daniel Gross. Their setup, featuring 2,512 H100 GPUs, can train an AI model with 65 billion parameters in roughly 10 days – but only for the startups they invest in. And still, the initiative is highly praised as it helps to democratize the access to so much needed compute. | Evaluating Compute Choices | In the realm of computational architecture for AI applications, discernment is key. Each option carries a suite of strengths balanced by inherent limitations. To navigate this complex decision matrix, we must scrutinize the primary contenders in the field. | The following explanation is hidden for free subscribers and is available to Premium users only → please Upgrade to have full access to this and other articles |
|
| On the Horizon – that’s an interesting one! | The following explanation is hidden for free subscribers and is available to Premium users only → please Upgrade to have full access to this and other articles |
|
| | Please give us feedback | | Thank you for reading, please feel free to share with your friends and colleagues. In the next couple of weeks, we are announcing our referral program 🤍 | | Previously in the FM/LLM series: | | | |
|