December 18, 2024 | Read Online

⚙️ Interview: A new approach to AI evaluation

Good morning. I sat down with self-driving expert Dr. Missy Cummings to chat all about the reality behind self-driving cars.

It’s a fascinating episode, if I do say so myself. Check it out!

— Ian Krietzberg, Editor-in-Chief, The Deep View

In today’s newsletter:

🩻 AI for Good: Broken bones
🚘 Waymo goes international
💻 Report: AI Search gains aren’t enough to displace Google
👁️‍🗨️ Interview: A new approach to AI evaluation

AI for Good: Broken bones

Source: Unsplash

Britain’s National Institute for Health and Care Excellence (NICE) in October approved four AI tools to aid clinicians in the detection of broken bones on X-Rays.

The details: The recommendation allows TechCare Alert, BoneView, RBfracture or Rayvolve to be used in urgent care settings in the U.K. while evidence surrounding their performance is gathered in this real-world setting.

Missed fractures reportedly occur in 3% to 10% of cases; the evidence so far suggests, according to NICE, that the AI platforms approved above may improve fracture detection compared with a clinician reviewing an X-Ray on their own.
The idea is that the systems employ AI to recognize and flag any potential anomalies, which are then reviewed by professionals; because this isn’t in any way replacing the clinicians, NICE said it’s a relatively low-risk application of AI.

Why it matters: “Using AI technology to help highly skilled professionals in urgent care centers to identify which of their patients has a fracture could potentially speed up diagnosis and reduce follow-up appointments needed because of a fracture missed during an initial assessment,” Mark Chapman, director of HealthTech at NICE, said.

200+ hours of research on AI tools & hacks packed in 3 hours

This free 3-hour Mini-Course on AI & ChatGPT (worth $399) will help you become a master of 20+ AI tools & prompting techniques and save 16 hours/week.

Get it now for absolutely free! (for first 100 users only) 🎁

This course will teach you how to:

Build a business that makes $10,000 by just using AI tools
Make quick & smarter decisions using AI-led data insights
Write emails, content & more in seconds using AI
Solve complex problems, research 10x faster & save 16 hours every week

Register & save your seat now! (100 free seats only)

Waymo goes international

Source: Waymo

In the latest example of Waymo’s seemingly ceaseless expansion, the self-driving firm on Monday said that it would soon start testing its autonomous vehicles in Tokyo, its first international expansion.

The details: The first Waymos will arrive in Tokyo early next year; the first stage of their deployment will involve the manual mapping of key areas around the city, all done in partnership with drivers from local taxi company Nihon Kotsu.

The data gathered from this manual mapping process will be used to train the AI systems that operate the vehicles.
It’s not clear yet when Waymo will be fully open for service in Tokyo, or how much of the city will be accessible to the self-driving vehicles. The company told CNBC that this initial testing phase is expected to take several quarters.

The landscape: Japan, according to the World Economic Forum, is actively exploring safer driving solutions for its aging population. As part of this, it has been testing self-driving ventures.

Several local companies — Tier IV, ZMP and Monet Technologies — are building and actively testing self-driving cars.

In the U.S., however, Waymo doesn’t really have much competition, especially given Cruise’s recent shutdown. As other firms have lagged behind or fallen off, Waymo has spent 2024 steadily expanding its areas of operation, recently announcing that it will soon begin testing in Miami, a significant step forward given the rainy weather conditions of the East Coast.

I sat down with Dr. Missy Cummings, the director of George Mason University’s Autonomy and Robotics Center, to talk about self-driving cars. She breaks down how they work, what their limitations are, and what a more realistic, grounded future of self-driving cars might look like.
You can watch (or listen) to the episode here.

The truth behind self-driving cars

China poised to investigate more US tech deals after Nvidia probe (The Information).
US finalizes $406 million chips subsidy for Taiwan's GlobalWafers (Reuters).
AI startup Databricks hits $62 billion valuation in $10 billion funding round (WSJ).
Dexcom’s over-the-counter glucose monitor now offers users an AI summary of how sleep, meals and more impact sugar levels (CNBC).
Canada is entering into uncharted political territory (Semafor).

If you want to get in front of an audience of 200,000+ developers, business leaders and tech enthusiasts, get in touch with us here.

The U.K. has begun a 10-week consultation regarding proposals to adjust copyright law in light of AI development. Mainly, it proposes an exception that would allow developers to train models on copyrighted content. Artists have already come out against the proposal.
Nvidia has entered into correction territory, meaning it has fallen more than 10% from its all-time high close. Still, the stock is up more than 160% for the year, no mean feat.

Report: AI Search gains aren’t enough to displace Google

Source: Unsplash

AI Search platforms — led by Perplexity and OpenAI — have become more popular of late. A new report from SEO firm BrightEdge found that the AI search entrants are “gaining ground,” but displacing Google is likely not in the cards.

The details: The report found that, in November, OpenAI’s search engine experienced a 44% month-over-month growth in referrals; Perplexity experienced a 71% growth.

BrightEdge found that OpenAI search — which launched as SearchGPT in August — now has six times more search usage than Perplexity in terms of referral clicks.
“This rapid ascent puts ChatGPT on a trajectory to potentially capture a 1% market share in 2025,” according to BrightEdge, something that could translate to $1.2 billion(+) in revenue.

But, Google: At the same time, Google has been expanding its AI Overviews, which, according to the company, are now reaching more than a billion users each day. Google’s AI Overviews, according to BrightEdge, have become far more stable than they were at launch, and Google’s search market share remains at 92.4%, “meaning that new entrants will need to coexist with and differentiate themselves from Google, rather than aiming to overtake the search giant.”

"This is a moment of inevitability in search; we've long anticipated the rise of AI, and now it's reshaping the search landscape before our eyes," Jim Yu, CEO and co-founder of BrightEdge, said in a statement. "The data clearly shows that the stakes have never been higher. Newer entrants like ChatGPT search and Perplexity are gaining ground, while Google’s AI Overviews are getting smarter.”

It remains unclear just how much the introduction of AI to search increases the regular energy consumption and carbon emissions of search.

Interview: A new approach to AI evaluation

Source: Unsplash

The past couple of years have seen a massive expansion of AI accessibility, Douwe Kiela, CEO and co-founder of enterprise AI startup Contextual AI, told me. Large Language Models (LLMs) and generative AI have never been more accessible than they are today; people no longer need to train models — or even understand a lick of code — in order to deploy models, thanks to APIs.

The problem, Kiela said, is that, even as models have become more accessible, methods of evaluating those models have not. Evaluation still requires a deep expertise in data science and machine learning at a pretty granular level, according to Kiela, who said that it remains a relatively involved “manual process” that many people just don’t know how to do.

“This would be fine if AI wasn't really used anywhere,” he said, chuckling. “But AI is used everywhere now. So it's becoming a huge problem, especially if you're in a regulated industry, or something like that, you have to actually think very deeply about what you're doing there … the tools don't really exist for people to do that properly.”
The ideal scenario, according to Keila — who was one of the co-authors of the original RAG research paper — would be to make LLM evaluation accessible to developers “in the same way that you make language model APIs accessible.”

Contextual AI on Tuesday introduced LMUnit, a system designed to do exactly that.

The details: According to Contextual, LMUnit enables developers to define and evaluate natural language unit tests to get detailed, “fine-grained” understandings of model performance, something that allows for “precise diagnosis” of potential problems.

Current methods for LLM evaluation — which involves the scoring of response content and quality — involve human annotation, automatic metrics and language model judging, where a separate model evaluates a model’s performance. The problem with these methods, according to Contextual, is that they’re either too expensive, require too much expertise or are simply not fine-grained enough to be of value.
Similar to unit testing for traditional software, LMUnit powers unit testing that evaluates “discrete qualities of individual model outputs — from basic accuracy and formatting to complex reasoning and domain-specific requirements. This enables developers to evaluate LLM responses granularly to learn specific signals for improvement.”

The tests can be constructed — in natural language — manually or synthetically, and score each response with a “pass” or “fail.”

“It just needs to be its own category of models,” Keila said. “Just like we have an embedding model, which is different from a language model, because we need to take those embeddings and put them in our vector database … these are just separate models with different kinds of things they do. And so evaluation very clearly needs to be its own category.”

“It's not about models. It's about systems, and the entire system is what solves your problem,” he added, saying that the language model component often only makes up around “20% of that system.”

LMUnit is now available both to the public through Contextual’s API, as well as to Contextual’s customers.

Contextual closed an $80 million funding round in August.

This comes amid both a broad push toward AI “agents” — which Keila referred to as systems but with more hype — and a steady increase in enterprise AI adoption. As corporations commit to spending more and more money on AI products, many have become heavily focused on ensuring that they are deriving clear returns from their costly investments; unsolved reliability issues at the model level have paved the way for broader systems that allow companies to overcome those problems, which could enable broader deployment.

Which image is real?

⬆️ Image 1

⬇️ Image 2

🤔 Your thought process:

Selected Image 2 (Left):

“This must be coastal week at The Deep View! Someone wishing they could travel over the holidays?”

Guilty. This week’s theme was: ‘places in Greece I’d rather be right now.’

Selected Image 1 (Right):

“I just gave it a quick glance today. Not feeling too well. I thought the bright white and the real one was just too bright.”

💭 A poll before you go

Thanks for reading today’s edition of The Deep View!

We’ll see you in the next one.

Here’s your view on smart glasses:

Only 12% of you currently use smart glasses.

47% of you don’t use them today, but expect you will soon; 30%, meanwhile, don’t ever plan on putting on a pair of smart glasses.

I’ll mess around with them, but I don’t love the idea of putting technology on my face. I’ll keep it in my hands, thanks.

Do you use Google's AI Overviews or other AI search platforms?

I love AI Overviews

I don't use AI Overviews, but I love Perplexity/ChatGPT Search

I don't use either. Don't see a need

I wish Google would just turn AI Overviews off already

Something else

Update your email preferences or unsubscribe here

228 Park Ave S, #29976, New York, New York 10003, United States

Terms of Service

⚙️ Another AI lawsuit

Wednesday, December 11, 2024

Plus: Tesla sued ... again ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

⚙️ Sora is here

Tuesday, December 10, 2024

Plus: AI-generated classes at UCLA ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

⚙️ China's semiconductor ban

Tuesday, December 10, 2024

Plus: AI & enterprise cybersecurity ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

⚙️ Meta goes nuclear

Tuesday, December 10, 2024

Plus: New record for AI investment ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

⚙️ OpenAI's big new launch

Tuesday, December 10, 2024

Plus: Waymo is heading to Florida ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

📧 Did you watch the FREE chapter of Pragmatic REST APIs?

Friday, February 28, 2025

Hey, it's Milan. 👋 The weekend is almost upon us. So, if you're up for some quality learning, consider watching the free chapter of Pragmatic REST APIs. Scroll down to the curriculum or click

Data Science Weekly - Issue 588

Thursday, February 27, 2025

Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

💎 Issue 458 - Why Ruby on Rails still matters

Thursday, February 27, 2025

This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 458 Release Date Feb 27, 2025 Your weekly report of the most popular Ruby news, articles and

📱 Issue 452 - Three questions about Apple, encryption, and the U.K

Thursday, February 27, 2025

This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 452 Release Date Feb 27, 2025 Your weekly report of the most popular iOS news, articles and projects Popular

💻 Issue 451 - .NET 10 Preview 1 is now available!

Thursday, February 27, 2025

This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 451 Release Date Feb 27, 2025 Your weekly report of the most popular .NET news, articles and projects

💻 Issue 458 - Full Stack Security Essentials: Preventing CSRF, Clickjacking, and Ensuring Content Integrity in JavaScript

Thursday, February 27, 2025

This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 458 Release Date Feb 27, 2025 Your weekly report of the most popular Node.js news, articles and

The Deep View - ⚙️ Self-driving in Japan

⚙️ Interview: A new approach to AI evaluation

AI for Good: Broken bones

200+ hours of research on AI tools & hacks packed in 3 hours

Waymo goes international

Report: AI Search gains aren’t enough to displace Google

Interview: A new approach to AI evaluation

Which image is real?

🤔 Your thought process:

Selected Image 2 (Left):

Selected Image 1 (Right):

💭 A poll before you go

Here’s your view on smart glasses:

Do you use Google's AI Overviews or other AI search platforms?

Older messages

⚙️ Another AI lawsuit

⚙️ Sora is here

⚙️ China's semiconductor ban

⚙️ Meta goes nuclear

⚙️ OpenAI's big new launch

You Might Also Like

📧 Did you watch the FREE chapter of Pragmatic REST APIs?

Data Science Weekly - Issue 588

💎 Issue 458 - Why Ruby on Rails still matters

📱 Issue 452 - Three questions about Apple, encryption, and the U.K

💻 Issue 451 - .NET 10 Preview 1 is now available!

💻 Issue 458 - Full Stack Security Essentials: Preventing CSRF, Clickjacking, and Ensuring Content Integrity in JavaScript

💻 Issue 458 - TypeScript types can run DOOM

💻 Issue 453 - Linus Torvalds Clearly Lays Out Linux Maintainer Roles Around Rust Code

💻 Issue 376 - Top 10 React Libraries/Frameworks for 2025 🚀

February 27th 2025