Platformer - How Google taught AI to doubt itself
Here’s this week’s free column — an interview with a Google product leader about how the company is teaching chatbots to doubt their own output when they’re wrong. Do you value independent reporting on our AI future? If so, maybe kick us ten bucks? It would mean a lot to us. Plus, we’ll email you first with all our scoops — like with our recent look at why the former Twitter may be bracing for an exodus of staffers. ➡️
Today let’s talk about an advance in Bard, Google’s answer to ChatGPT, and how it addresses one of the most pressing problems with today’s chatbots: their tendency to make things up. From the day that the chatbots arrived last year, their makers warned us not to trust them. The text generated by tools like ChatGPT does not draw on a database of established facts. Instead, chatbots are predictive — making probabilistic guesses about which words seem right based on the massive corpus of text that their underlying large language models were trained on. As a result, chatbots are often “confidently wrong,” to use the industry’s term. And this can fool even highly educated people, as we saw this year with the case of the lawyer who submitted citations generated by ChatGPT — not realizing that every single case had been fabricated out of whole cloth. This state of affairs explains why I find chatbots mostly useless as research assistants. They’ll tell you anything you want, often within seconds, but in most cases without citing their work. As a result, you wind up spending a lot of time researching their answers to see whether they’re true — often defeating the purpose of using them at all. When it launched earlier this year, Google’s Bard came with a “Google It” button that submitted your query to the company’s search engine. This made it slightly faster to get a second opinion about the chatbot’s output, but still placed the burden for determining what is true and false squarely on you. Starting today, though, Bard will do a bit more work on your behalf. After the chatbot answers one of your queries, hitting the Google button will “double check” your response. Here’s how the company explained it in a blog post:
Double-checking a query will turn many of the sentences within the response green or brown. Green-highlighted responses are linked to cited web pages; hover over one and Bard will show you the source of the information. Brown-highlighted responses indicate that Bard doesn’t know where the information came from, highlighting a likely mistake. When I double-checked Bard’s answer to my question about the history of the band Radiohead, for example, it gave me lots of green-highlighted sentences that squared with my own knowledge. But it also turned this sentence brown: “They have won numerous awards, including six Grammy Awards and nine Brit Awards.” Hovering over the words showed that Google’s search had shown contradictory information; indeed, Radiohead has (criminally) never won a single Brit Award, much less nine of them. “I’m going to tell you about a tragedy that happened in my life,” Jack Krawczyk, a senior director of product at Google, told me in an interview last week. Krawczyk had cooked swordfish at home, and the resulting smell seemed to permeate the entire house. He used Bard to look up ways to get rid of it, and then double-checked the results to separate fact from fiction. It turns out the cleaning the kitchen thoroughly would not fix the problem, as the chatbot had originally stated. But placing bowls of baking soda around the house might help. If you’re wondering why Google doesn’t double-check answers like this before showing them to you, so did I. Krawczyk told me that, given the wide variety of ways people use Bard, double-checking is frequently unnecessary. (You wouldn’t typically ask it to double-check a poem you wrote, or an email it drafted, and so on.) And while double-checking represents a clear step forward, it does still often require you to pull up all those citations and make sure Bard is interpreting those search results correctly. At least when it comes to research, human beings are still holding the AI’s hand as much as it is holding ours. Still, it’s a welcome development. “We may have created the first language model that admits it has made a mistake,” Krawczyk told me. And given the stakes as these models improve, ensuring that AI models accurately confess to their mistakes ought to be a high priority for the industry. Bard got another big update Tuesday: it can now connect to your Gmail, Docs, Drive, and a handful of other Google products, including YouTube and Maps. Extensions, as they’re called, let you search, summarize, and ask questions about documents you have stored in your Google account in real time. For now, it’s limited to personal accounts, which dramatically limits its utility, at least for me. It is sometimes interesting as an alternative way to browse the web — it did a good job, for example, when I asked it to show me some good videos about getting started in interior design. (The fact that you can play those videos inline in the Bard answer window is a nice touch.) But extensions get a lot of stuff wrong, too, and there’s no button to press here to improve the results. When I asked Bard to find my oldest email with a friend who I’ve been exchanging messages with in Gmail for 20 years now, Bard showed me a message from 2021. When I asked it which messages in my inbox might need a prompt response, Bard suggested a piece of spam with the subject line “Hassle-free printing is possible with HP Instant Ink.” It does better in scenarios where Google can make money. Ask it to plan an itinerary for a trip to Japan including flight and hotel information, and it will pull up a good selection of choices from which Google can take a cut of the purchase. Eventually, I imagine that extensions will come to Bard, just as they previously have to ChatGPT. (They’re called plug-ins over there.) The promise of being able to get things done on the web through a conversational interface is huge, even if the experience today is only so-so. The question over the long term is how well AI will ultimately be able to check its own work. Today, the task of steering chatbots toward the right answer still weighs heavily on the person typing the prompt. In this moment, tools that push AIs to cite their work are greatly needed. Eventually, though, here’s hoping that more of that work falls on the tools themselves — and without us always having to ask for it. Governing
Industry
Those good postsFor more good posts every day, follow Casey’s Instagram stories. (Link) (Link) (Link) Talk to usSend us tips, comments, questions, and AI extensions: casey@platformer.news and zoe@platformer.news. By design, the vast majority of Platformer readers never pay anything for the journalism it provides. But you made it all the way to the end of this week’s edition — maybe not for the first time. Want to support more journalism like what you read today? If so, click here: |
Older messages
What I learned in year three of Platformer
Tuesday, September 19, 2023
Has the Substack revolution come and gone? PLUS: What's changing in year four
The FTC takes aim at X
Sunday, September 17, 2023
Company documents reveal how a stalled data deletion project and inadequate data protections could put X in the agency's crosshairs
Nine wild details from the new Elon Musk biography
Wednesday, September 13, 2023
Walter Isaacson brings new tales of Jack Dorsey, that Sergey Brin selfie, and more
Google goes to court
Friday, September 8, 2023
On the eve of a major antitrust trial — and its 25th birthday — the company is bracing for a fight
The unbearable slowness of Meta's Oversight Board
Wednesday, August 30, 2023
A 234-day wait to get a ruling in a case about incitement to violence suggests that something important is broken
You Might Also Like
💋 Start your 2025 success story now with Gretta’s blueprint
Thursday, December 26, 2024
Don't wait for the new year—transform your life NOW with our biggest deal ever. fdrlogo Hey Friend , Thinking of owning an online business instead of working a boring 9-5? Now's your chance
Boring Strategy, Remote Nomad Jobs, GenFuse AI, Mochi Video AI, Notepad Online, and more
Wednesday, December 25, 2024
a powerful tool that transforms your ideas into a video BetaList BetaList Weekly Mochi Video AI a powerful tool that transforms your ideas into a video Remote Nomad Jobs 100% remote jobs for digital
💥 Make 2025 The Best Year of Your Life - CreatorBoom
Wednesday, December 25, 2024
Six Figure Local Newsletter, How Eddie Shleyner Built Very Good Copy, 10 Newsletter Success Stories Generating $1.1M in MRR, 4 Boring Websites That Make over $35k Per Month, 6 Things to Do if Your
🚀 This holiday, learn from the best & transform 2025
Wednesday, December 25, 2024
These experts have built $100M+ businesses—now they're here to help you do the same. fdrlogo Hey Friend , What do 30000+ Foundr students know that you don't? They know the difference between
🗞 What's New: AI video editing is coming to Instagram
Tuesday, December 24, 2024
Also: Mobile app earnings jumped 15.7% in 2024 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
[CEI] Chrome Extension Ideas #171
Tuesday, December 24, 2024
ideas for Amazon, Podcast, Twitter, and AI ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Top angel investors in the U.S.
Tuesday, December 24, 2024
Inspiration for who to raise from when you're raising your early rounds ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🎁 🎄 HO HO HO! Here's the ultimate gift for your business journey
Tuesday, December 24, 2024
Unwrap your holiday gifts and start building your dream in 2025! fdrlogo Hey Friend , HO HO HO! Your holiday gifts have arrived! This isn't your typical holiday surprise—these gifts are proven
Biggest rounds of 2024
Tuesday, December 24, 2024
+ Sriram Krishnan joining Trump's government View in browser Sponsor Card - Up Round-35 Good morning there, Welcome to the last Sifted Daily newsletter of 2024, in which we look back on the biggest
The Corner Office & Low Exp 👩💼
Monday, December 23, 2024
And some holiday news͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏