͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

Some Non-Obvious Points About OpenAI 01

Plus some major funding rounds by World Labs and Glean , Mistral's new release and more.

Sep 15

READ IN APP

OpenAI o1 Hub | OpenAI — Image Credit: OpenAI

Next Week in The Sequence:

Edge 431: Our series about space state models(SSMs) continues with an overview of multimodal SSMs. We discuss the Cobra SSM multimodal model and NVIDIA’s TensorRT-LLM framework.
Edge 432: Dives into NVIDIA’s Minitron models distilled from Llama 3.1.

You can subscribe to The Sequence below:

📝 Editorial: Some Non-Obvious Points About OpenAI 01

The release of OpenAI’s new model dominated headlines this week. The o1 models are specialized in reasoning and planning, areas that have long been of interest to OpenAI. Much of the debate in online circles has focused on the model’s specific capabilities, such as whether the terms "reasoning" and "thinking" are appropriate, so there is plenty of content discussing that. Instead of contributing to the debate, I wanted to highlight a few key points that I found particularly interesting while reading the o1 technical report.

It seems that the o1 models were trained and fine-tuned using different methodologies compared to their predecessors. Specifically, OpenAI used reinforcement learning optimized for chain of thought (CoT) scenarios, which is somewhat unique.
Initial results indicate that this reinforcement learning for CoT technique can scale significantly, potentially leading to new breakthroughs in reasoning and planning.
Only CoT summaries, rather than complete CoT traces, are available via the API, making it difficult to determine how the model arrives at specific outputs.
Somewhat paradoxically, CoT-focused models might lower the entry point for interpretability since we are starting with a baseline of reasoning traces.
One of the most interesting aspects of o1 is the shift from training to inference compute time. Inference, rather than training, is increasingly becoming a key requirement for complex reasoning tasks. The reasoning core doesn’t necessarily need to be a large model, which could translate into decreases in training time. We will need to see how this strategy evolves over time.
This point makes me think we might be witnessing the start of a new set of scaling laws focused on inference.
The red-teaming efforts for o1, with companies such as Apollo Research and Haize Labs, are quite impressive and worth diving into in the technical report.
Unsurprisingly, o1 is much harder to jailbreak than previous models, and it spends much more time on inference. That said, there have already been several successful jailbreak attempts.

OpenAI o1 clearly shows that reasoning is one of the next frontiers of foundation model research and, more importantly, that improvements in foundation model architectures are not stalling—they may just take some time to materialize.

🔎 ML Research

LLMs for Novel Research Ideas

AI researchers from Stanford University published a study about the research ideation capabilities of LLMs. The experiment draws a comparison between human- and LLM generated ideas across different nove fields. The results might surprise you —> Read more.

Agent Workflow Memory

Researchers from MIT and Carnegie Mellon University published a paper introducing Agent Workflow Memory(AWM), a method for reusable tasks workflows in agents. AWM, introduces reusable tasks to agents so that they can be used to guide future actions —> Read more.

Modular LLMs

Researchers from Princeton University, Carnegie Mellon University , Tsinghua University, UCLA and several other AI labs published a paper proposing a modular design for LLMs. Specifically, the paper introduces the term of “brick” to define a functional block within an LLM and highlights the efficiencies of following this composable approch for LLM construction —> Read more.

Better Math Agents

Google DeepMind published a paper introducing a preference learning framework to optimize the performance of math AI models. The framework uses techniques such as multi-turn and tool-integrated reasoning to improve the efficiency of single-turn math models —> Read more.

WINDOWSAGENTARENA

Researchers from Microsoft, Columbia University and Carnegie Mellon University published a paper detailing WINDOWSAGENTARENA, an environment for evaluating agents in tasks in the Windows OS. The environment includes over 150 diverse tasks that requires capabilites such as screen understanding, tool usage and planning —> Read more.

LLaMA-Omni

Researchers from several elite chinese AI labs published a paper proposing LLaMA-Omni, an architecture for integrating speech interactions with open source LLMs. LLaMA-Omni integrates a pretrained speech encoder, a speech adapter and a streaming speech decoder with an LLM such as LLaMA in order to process text and speech data simulataneously —> Read more.

🤖 AI Tech Releases

OpenAI o1

OpenAI released a new family of models specialized in reasoning —> Read more.

AgentForce

Salesforce unveiled AgentForce, its platform for autonomous AI agents —> Read more.

DataGemma

Google open sourced DataGemma, a series of small models grounded in factual data —> Read more.

Pixtral 12B

Mistral released Pixtral 12B, its first multimodal model for images and text —> Read more.

🛠 Real World AI

AI for Coding at Salesforce

Salesforce discusses CodeGenie, an internal tool used to boost developer productivity using generative AI —> Read more.

Data Center Cooling at Meta

Meta discusses the reinforcement learning techniques used for cooling optimization in their data centers —> Read more.

📡AI Radar

AI pioneer Fei-Fei Li’s company World Labs raised another $230 million.
AI-search platform Glean raised $260 million in a Series E.
OpenAI is rumoured to be raising a new round at a $150 billion valuation.
Google co-founder Sergey Brin gave a rare interview about his recent work on AI.
Arcee AI released its SuperNova 70B model.
AI agent platform Landbase came out of stealth with $12.5 million in funding.
InMobi secured $100 million for AI acquisition ahead of its IPO.
AI bookkeeping startup Finally raised $200 million.
Stability AI and Lenovo partnered for text-to-image capabilities.
AI translation platform Smartcat raised $43 million.
ServiceNow unveiled a series of AI agents for customer service, procurement, HR and others.
OffDeal announced a $4.7 million round to improve M&A for small businesses.
AI-powered compliance platform Datricks raised $15 million in a new round.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Restack

TheSequence - Some Non-Obvious Points About OpenAI 01

Some Non-Obvious Points About OpenAI 01

Plus some major funding rounds by World Labs and Glean , Mistral's new release and more.

Next Week in The Sequence:

You can subscribe to The Sequence below:

📝 Editorial: Some Non-Obvious Points About OpenAI 01

🔎 ML Research

LLMs for Novel Research Ideas

Agent Workflow Memory

Modular LLMs

Better Math Agents

WINDOWSAGENTARENA

LLaMA-Omni

🤖 AI Tech Releases

OpenAI o1

AgentForce

DataGemma

Pixtral 12B

🛠 Real World AI

AI for Coding at Salesforce

Data Center Cooling at Meta

📡AI Radar

Older messages

Edge 430: Learn About The AI Scientist, The Model that can Conduct Long Term Scientific Experimentation

The Sequence Chat: Lewis Tunstall, Hugging Face, On Building the Model that Won the AI Math Olympiad

Edge 429: MambaByte and the Idea of Tokenization-Free SSMs

Sakana AI

Edge 428: Inside PrompPoet: Character.ai's Framework for Prompt Engineering

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR