Platformer - How Google taught AI to doubt itself
Here’s this week’s free column — an interview with a Google product leader about how the company is teaching chatbots to doubt their own output when they’re wrong. Do you value independent reporting on our AI future? If so, maybe kick us ten bucks? It would mean a lot to us. Plus, we’ll email you first with all our scoops — like with our recent look at why the former Twitter may be bracing for an exodus of staffers. ➡️
Today let’s talk about an advance in Bard, Google’s answer to ChatGPT, and how it addresses one of the most pressing problems with today’s chatbots: their tendency to make things up. From the day that the chatbots arrived last year, their makers warned us not to trust them. The text generated by tools like ChatGPT does not draw on a database of established facts. Instead, chatbots are predictive — making probabilistic guesses about which words seem right based on the massive corpus of text that their underlying large language models were trained on. As a result, chatbots are often “confidently wrong,” to use the industry’s term. And this can fool even highly educated people, as we saw this year with the case of the lawyer who submitted citations generated by ChatGPT — not realizing that every single case had been fabricated out of whole cloth. This state of affairs explains why I find chatbots mostly useless as research assistants. They’ll tell you anything you want, often within seconds, but in most cases without citing their work. As a result, you wind up spending a lot of time researching their answers to see whether they’re true — often defeating the purpose of using them at all. When it launched earlier this year, Google’s Bard came with a “Google It” button that submitted your query to the company’s search engine. This made it slightly faster to get a second opinion about the chatbot’s output, but still placed the burden for determining what is true and false squarely on you. Starting today, though, Bard will do a bit more work on your behalf. After the chatbot answers one of your queries, hitting the Google button will “double check” your response. Here’s how the company explained it in a blog post:
Double-checking a query will turn many of the sentences within the response green or brown. Green-highlighted responses are linked to cited web pages; hover over one and Bard will show you the source of the information. Brown-highlighted responses indicate that Bard doesn’t know where the information came from, highlighting a likely mistake. When I double-checked Bard’s answer to my question about the history of the band Radiohead, for example, it gave me lots of green-highlighted sentences that squared with my own knowledge. But it also turned this sentence brown: “They have won numerous awards, including six Grammy Awards and nine Brit Awards.” Hovering over the words showed that Google’s search had shown contradictory information; indeed, Radiohead has (criminally) never won a single Brit Award, much less nine of them. “I’m going to tell you about a tragedy that happened in my life,” Jack Krawczyk, a senior director of product at Google, told me in an interview last week. Krawczyk had cooked swordfish at home, and the resulting smell seemed to permeate the entire house. He used Bard to look up ways to get rid of it, and then double-checked the results to separate fact from fiction. It turns out the cleaning the kitchen thoroughly would not fix the problem, as the chatbot had originally stated. But placing bowls of baking soda around the house might help. If you’re wondering why Google doesn’t double-check answers like this before showing them to you, so did I. Krawczyk told me that, given the wide variety of ways people use Bard, double-checking is frequently unnecessary. (You wouldn’t typically ask it to double-check a poem you wrote, or an email it drafted, and so on.) And while double-checking represents a clear step forward, it does still often require you to pull up all those citations and make sure Bard is interpreting those search results correctly. At least when it comes to research, human beings are still holding the AI’s hand as much as it is holding ours. Still, it’s a welcome development. “We may have created the first language model that admits it has made a mistake,” Krawczyk told me. And given the stakes as these models improve, ensuring that AI models accurately confess to their mistakes ought to be a high priority for the industry. Bard got another big update Tuesday: it can now connect to your Gmail, Docs, Drive, and a handful of other Google products, including YouTube and Maps. Extensions, as they’re called, let you search, summarize, and ask questions about documents you have stored in your Google account in real time. For now, it’s limited to personal accounts, which dramatically limits its utility, at least for me. It is sometimes interesting as an alternative way to browse the web — it did a good job, for example, when I asked it to show me some good videos about getting started in interior design. (The fact that you can play those videos inline in the Bard answer window is a nice touch.) But extensions get a lot of stuff wrong, too, and there’s no button to press here to improve the results. When I asked Bard to find my oldest email with a friend who I’ve been exchanging messages with in Gmail for 20 years now, Bard showed me a message from 2021. When I asked it which messages in my inbox might need a prompt response, Bard suggested a piece of spam with the subject line “Hassle-free printing is possible with HP Instant Ink.” It does better in scenarios where Google can make money. Ask it to plan an itinerary for a trip to Japan including flight and hotel information, and it will pull up a good selection of choices from which Google can take a cut of the purchase. Eventually, I imagine that extensions will come to Bard, just as they previously have to ChatGPT. (They’re called plug-ins over there.) The promise of being able to get things done on the web through a conversational interface is huge, even if the experience today is only so-so. The question over the long term is how well AI will ultimately be able to check its own work. Today, the task of steering chatbots toward the right answer still weighs heavily on the person typing the prompt. In this moment, tools that push AIs to cite their work are greatly needed. Eventually, though, here’s hoping that more of that work falls on the tools themselves — and without us always having to ask for it. Governing
Industry
Those good postsFor more good posts every day, follow Casey’s Instagram stories. (Link) (Link) (Link) Talk to usSend us tips, comments, questions, and AI extensions: casey@platformer.news and zoe@platformer.news. By design, the vast majority of Platformer readers never pay anything for the journalism it provides. But you made it all the way to the end of this week’s edition — maybe not for the first time. Want to support more journalism like what you read today? If so, click here: |
Older messages
What I learned in year three of Platformer
Tuesday, September 19, 2023
Has the Substack revolution come and gone? PLUS: What's changing in year four
The FTC takes aim at X
Sunday, September 17, 2023
Company documents reveal how a stalled data deletion project and inadequate data protections could put X in the agency's crosshairs
Nine wild details from the new Elon Musk biography
Wednesday, September 13, 2023
Walter Isaacson brings new tales of Jack Dorsey, that Sergey Brin selfie, and more
Google goes to court
Friday, September 8, 2023
On the eve of a major antitrust trial — and its 25th birthday — the company is bracing for a fight
The unbearable slowness of Meta's Oversight Board
Wednesday, August 30, 2023
A 234-day wait to get a ruling in a case about incitement to violence suggests that something important is broken
You Might Also Like
⏰ Final day to join MicroConf Connect (Applications close at midnight)
Wednesday, January 15, 2025
MicroConf Hey Rob! Don't let another year go by figuring things out alone. Today is your final chance to join hundreds of SaaS founders who are already working together to make 2025 their
How I give high-quality feedback quickly
Wednesday, January 15, 2025
If you're not regularly giving feedback, you're missing a chance to scale your judgment. Here's how to give high-quality feedback in as little as 1-2 hours per week. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
💥 Being Vague is Costing You Money - CreatorBoom
Wednesday, January 15, 2025
The Best ChatGPT Prompt I've Ever Created, Get More People to Buy Your Course, Using AI Generated Videos on Social Media, Make Super Realistic AI Images of Yourself, Build an in-email streak
Enter: A new unicorn
Wednesday, January 15, 2025
+ French AI startup investment doubles; Klarna partners with Stripe; Bavaria overtakes Berlin View in browser Leonard_Flagship Good morning there, France is strengthening its position as one of the
Meta just flipped the switch that prevents misinformation from spreading in the United States
Wednesday, January 15, 2025
The company built effective systems to reduce the reach of fake news. Last week, it shut them down Platformer Platformer Meta just flipped the switch that prevents misinformation from spreading in the
Ok... we're now REALLY live Friend !
Tuesday, January 14, 2025
Join Jackie Damelian to learn how to validate your product and make your first sales. Hi Friend , Apologies, we experienced some technical difficulties but now We're LIVE for Day 3 of the Make Your
Building GTM for AI : Office Hours with Maggie Hott
Tuesday, January 14, 2025
Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. Building GTM for AI : Office Hours with Maggie Hott On
ICYMI: Musk's TikTok, AI's future, films for founders
Tuesday, January 14, 2025
A recap of the last week ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🚨 [LIVE IN 1 HOUR] Day 3 of the Challenge with Jackie Damelian
Tuesday, January 14, 2025
Join Jackie Damelian to learn how to validate your product and make your first sales. Hi Friend , Day 3 of the Make Your First Shopify Sale 5-Day Challenge is just ONE HOUR away! ⌛ Here's the link
The Broken Ladder & The Missing Manager 🪜
Tuesday, January 14, 2025
And rolling through work on a coaster͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏