͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

Anthropic, WOW

New models, an agent that can interact with your computer and a new code generation tool.

Oct 27

READ IN APP

Next Week in The Sequence:

Edge 443: We close our series about state space models and announce a new and exciting series.
The Sequence Chat: Will provide a perspective of transformer models as a computer.
Edge 444: We dive into Meta AI’s amazing Movie Gen model.

You can subscribe to The Sequence below:

A small self-serving note before we start 😉:
For the past year, I’ve been working on several ideas in AI evaluation and benchmarking—an area that, as many of you know, presents a massive challenge in today’s AI landscape. After experimenting with various approaches, I decided to incubate LayerLens, a new AI company focused on streamlining the evaluation and benchmarking of foundation models. This marks my third venture-backed AI project in the last 18 months. We've assembled a phenomenal team, with experience at companies like Google, Microsoft, and Cisco, as well as top universities. We’ve also raised a sizable pre-seed round. More details about that in the next few weeks.
We are currently hiring across the board, particularly for roles in AI research and engineering with a focus on benchmarking and evaluation. If you’re interested in this space and looking for a new challenge, feel free to reach out to me at jr@layerlens.ai. I look forward to hearing from some of you!
Now, onto today’s editorial:

📝 Editorial: Anthropic, WOW

What a week for Anthropic. The AI powerhouse announced a wave of exciting new releases, signaling a significant leap forward in AI capabilities. The highlight is undoubtedly the introduction of "computer use," a feature that allows their AI model, Claude, to interact with computers much like a human user would. Claude can now interpret on-screen information, move the cursor, click, and type, opening up a vast array of potential applications previously inaccessible to AI systems. This feature is currently in public beta, available through the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.

This advancement in computer use builds upon Anthropic's previous work in tool use and multimodality, enabling Claude to seamlessly interpret screen images and execute tasks using available software tools. The training process involved teaching Claude to accurately count pixels to control cursor movement, a crucial skill for precise mouse commands. Remarkably, Claude demonstrated rapid generalization from training on basic software like calculators and text editors, showcasing its ability to translate user prompts into a series of logical steps and actions on the computer.

In addition to computer use, Anthropic has also released upgraded versions of its existing models. Claude 3.5 Sonnet, the model capable of computer use, has received substantial enhancements, boasting significant performance gains in coding and tool use tasks. Notably, it has achieved industry-leading results on coding benchmarks, surpassing even specialized systems designed for such tasks.

Furthermore, Anthropic is introducing Claude 3.5 Haiku, a new model designed for speed and affordability. It delivers performance comparable to Claude 3 Opus, their previous largest model, at a significantly lower cost and with similar speed to the previous generation of Haiku8. Claude 3.5 Haiku excels in coding tasks and boasts low latency, making it well-suited for user-facing applications and situations requiring rapid processing of large data volumes.

Complementing these model upgrades, Anthropic has also introduced a new "analysis tool" in Claude.ai. This tool empowers Claude to write and execute JavaScript code, enabling it to perform data analysis, generate insights, and even create visualizations. Think of it as a built-in code sandbox that allows Claude to perform complex calculations and manipulate data, leading to more precise and reproducible answers.

These new capabilities signal Anthropic’s aspirations to get into the agents space at a monumental scale. All in all, a remarkable week of releases for Anthropic.

🔎 ML Research

PANGEA

Researchers from Carnegie Mellon University published a paper introducing PANGEA, a multilingual-multimodal LLM supporting 39 languages. The research also includes PANGEABEANCH, a benchmark encompassing 14 datasets in 47 languages —> Read more.

Meta Research Artifacts

Meta AI published the research and open source artifacts behind several models including Segment Anything 2.1. The release also includes Spirit LM, a model for speech and text integration —> Read more.

Controllable Safety Alignment

Microsoft Research and Johns Hopkins University published a paper proposing Controllable Safety Alignment (CoSA), a framework designed to adapt LLMs to different safety constraints without retraining. CoSA allows models to follow safety instructions in natural language —> Read more.

CoT and Vision-Language Models

Researchers from Apple and Carnegie Mellon University published a paper showcasing the impact of CoT in visual language models(VLMs). The paper uses a technique that distills CoT traces from LLMs and uses those to fine-tune VLMs —> Read more.

BLIP-3-Video

Salesforce Research published a paper introducing xGen-MM-Vid (BLIP-3-Video), a multimodal LLM for video. xGen-MM-Vid uses techniques such as temporal encoders and visual tokenizers to capture temporal information over multiple frames —> Read more.

Sabotage Evaluations

Anthropic published a research paper introducing Sabotage Evaluation for frontier models. These evaluations quantify the ability of a foundation model to subvert human oversight on specific contexts —> Read more.

🤖 AI Tech Releases

Claude

Anthropic released an upgraded version Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku —> Read more.

Claude Computer use

The latest version of Claude can take actions in computer environments —> Read more.

Quantized Llama

Meta released two quantized versions of Llama 3.2 with 1B and 3B parameters respectively —> Read more.

Stable Diffusion 3.5

Stability AI open sourced a new version of its marquee text- to- image model —> Read more.

AutoTrain

HuggingFace open sourced AutoTrain, a framework for training LLMs with a few clicks —> Read more.

IBM Granite

IBM released Granite, a family of models optimized for enterprise workloads —> Read more.

🛠 Real World AI

Recommendations at Amazon

Amazon explores the ML techniques used to remove bias in recommendations —> Read more.

📡AI Radar

There are rumors that OpenAI will release its next big model before the end of the year.
Microsoft released a new wave of AI agents for its Dynamics 365 CRM platform.
Runway showcased a preview of Act-One, a new tool for generating expressive characters.
Humanoid robotics startup Agility is closing a $150 million round.
Apple released an API for its upcoming Apple Intelligence features.
Ideogram introduced Canvas, a new interface for inpainting and outpainting capabilities .
AI notepad app Granola raised a $20 million Series A.
Agentic banking platform interface.ai raised $30 million in new funding.
Cohere released multimodal embeddings.
Asana announced its AI Studio for automating repetitive tasks.
Midjourney announced a new image editor.
Neysa, a cloud AI platform, raised $30 million in new funding.
Pharos raised $5 million to use AI in medical reporting.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Restack

TheSequence - Anthropic, WOW

Anthropic, WOW

New models, an agent that can interact with your computer and a new code generation tool.

Next Week in The Sequence:

You can subscribe to The Sequence below:

📝 Editorial: Anthropic, WOW

🔎 ML Research

PANGEA

Meta Research Artifacts

Controllable Safety Alignment

CoT and Vision-Language Models

BLIP-3-Video

Sabotage Evaluations

🤖 AI Tech Releases

Claude

Claude Computer use

Quantized Llama

Stable Diffusion 3.5

AutoTrain

IBM Granite

🛠 Real World AI

Recommendations at Amazon

📡AI Radar

Older messages

Edge 442: If You Thought DeepMind's AlphaFold was Impressive, Wait Until You Learn About AlphaProteo

Edge 441: SSMs Beyond Language

The Sequence Chat: Why Transformers are the Best Thing that Ever Happened to NVIDIA

NVIDIA Releases Nemotron 70B

AI Dropped the Mic at the Nobel Party

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR