Platformer - How Google taught AI to doubt itself
Here’s this week’s free column — an interview with a Google product leader about how the company is teaching chatbots to doubt their own output when they’re wrong. Do you value independent reporting on our AI future? If so, maybe kick us ten bucks? It would mean a lot to us. Plus, we’ll email you first with all our scoops — like with our recent look at why the former Twitter may be bracing for an exodus of staffers. ➡️
Today let’s talk about an advance in Bard, Google’s answer to ChatGPT, and how it addresses one of the most pressing problems with today’s chatbots: their tendency to make things up. From the day that the chatbots arrived last year, their makers warned us not to trust them. The text generated by tools like ChatGPT does not draw on a database of established facts. Instead, chatbots are predictive — making probabilistic guesses about which words seem right based on the massive corpus of text that their underlying large language models were trained on. As a result, chatbots are often “confidently wrong,” to use the industry’s term. And this can fool even highly educated people, as we saw this year with the case of the lawyer who submitted citations generated by ChatGPT — not realizing that every single case had been fabricated out of whole cloth. This state of affairs explains why I find chatbots mostly useless as research assistants. They’ll tell you anything you want, often within seconds, but in most cases without citing their work. As a result, you wind up spending a lot of time researching their answers to see whether they’re true — often defeating the purpose of using them at all. When it launched earlier this year, Google’s Bard came with a “Google It” button that submitted your query to the company’s search engine. This made it slightly faster to get a second opinion about the chatbot’s output, but still placed the burden for determining what is true and false squarely on you. Starting today, though, Bard will do a bit more work on your behalf. After the chatbot answers one of your queries, hitting the Google button will “double check” your response. Here’s how the company explained it in a blog post:
Double-checking a query will turn many of the sentences within the response green or brown. Green-highlighted responses are linked to cited web pages; hover over one and Bard will show you the source of the information. Brown-highlighted responses indicate that Bard doesn’t know where the information came from, highlighting a likely mistake. When I double-checked Bard’s answer to my question about the history of the band Radiohead, for example, it gave me lots of green-highlighted sentences that squared with my own knowledge. But it also turned this sentence brown: “They have won numerous awards, including six Grammy Awards and nine Brit Awards.” Hovering over the words showed that Google’s search had shown contradictory information; indeed, Radiohead has (criminally) never won a single Brit Award, much less nine of them. “I’m going to tell you about a tragedy that happened in my life,” Jack Krawczyk, a senior director of product at Google, told me in an interview last week. Krawczyk had cooked swordfish at home, and the resulting smell seemed to permeate the entire house. He used Bard to look up ways to get rid of it, and then double-checked the results to separate fact from fiction. It turns out the cleaning the kitchen thoroughly would not fix the problem, as the chatbot had originally stated. But placing bowls of baking soda around the house might help. If you’re wondering why Google doesn’t double-check answers like this before showing them to you, so did I. Krawczyk told me that, given the wide variety of ways people use Bard, double-checking is frequently unnecessary. (You wouldn’t typically ask it to double-check a poem you wrote, or an email it drafted, and so on.) And while double-checking represents a clear step forward, it does still often require you to pull up all those citations and make sure Bard is interpreting those search results correctly. At least when it comes to research, human beings are still holding the AI’s hand as much as it is holding ours. Still, it’s a welcome development. “We may have created the first language model that admits it has made a mistake,” Krawczyk told me. And given the stakes as these models improve, ensuring that AI models accurately confess to their mistakes ought to be a high priority for the industry. Bard got another big update Tuesday: it can now connect to your Gmail, Docs, Drive, and a handful of other Google products, including YouTube and Maps. Extensions, as they’re called, let you search, summarize, and ask questions about documents you have stored in your Google account in real time. For now, it’s limited to personal accounts, which dramatically limits its utility, at least for me. It is sometimes interesting as an alternative way to browse the web — it did a good job, for example, when I asked it to show me some good videos about getting started in interior design. (The fact that you can play those videos inline in the Bard answer window is a nice touch.) But extensions get a lot of stuff wrong, too, and there’s no button to press here to improve the results. When I asked Bard to find my oldest email with a friend who I’ve been exchanging messages with in Gmail for 20 years now, Bard showed me a message from 2021. When I asked it which messages in my inbox might need a prompt response, Bard suggested a piece of spam with the subject line “Hassle-free printing is possible with HP Instant Ink.” It does better in scenarios where Google can make money. Ask it to plan an itinerary for a trip to Japan including flight and hotel information, and it will pull up a good selection of choices from which Google can take a cut of the purchase. Eventually, I imagine that extensions will come to Bard, just as they previously have to ChatGPT. (They’re called plug-ins over there.) The promise of being able to get things done on the web through a conversational interface is huge, even if the experience today is only so-so. The question over the long term is how well AI will ultimately be able to check its own work. Today, the task of steering chatbots toward the right answer still weighs heavily on the person typing the prompt. In this moment, tools that push AIs to cite their work are greatly needed. Eventually, though, here’s hoping that more of that work falls on the tools themselves — and without us always having to ask for it. Governing
Industry
Those good postsFor more good posts every day, follow Casey’s Instagram stories. (Link) (Link) (Link) Talk to usSend us tips, comments, questions, and AI extensions: casey@platformer.news and zoe@platformer.news. By design, the vast majority of Platformer readers never pay anything for the journalism it provides. But you made it all the way to the end of this week’s edition — maybe not for the first time. Want to support more journalism like what you read today? If so, click here: |
Older messages
What I learned in year three of Platformer
Tuesday, September 19, 2023
Has the Substack revolution come and gone? PLUS: What's changing in year four
The FTC takes aim at X
Sunday, September 17, 2023
Company documents reveal how a stalled data deletion project and inadequate data protections could put X in the agency's crosshairs
Nine wild details from the new Elon Musk biography
Wednesday, September 13, 2023
Walter Isaacson brings new tales of Jack Dorsey, that Sergey Brin selfie, and more
Google goes to court
Friday, September 8, 2023
On the eve of a major antitrust trial — and its 25th birthday — the company is bracing for a fight
The unbearable slowness of Meta's Oversight Board
Wednesday, August 30, 2023
A 234-day wait to get a ruling in a case about incitement to violence suggests that something important is broken
You Might Also Like
Northvolt files for bankruptcy
Friday, November 22, 2024
Plus: Slush 2024 takeaways; Europe's newest unicorn View in browser Sponsor Card - Up Round-31 Good morning there, European climate tech poster child Northvolt is filing for Chapter 11 bankruptcy
Nov 2024: My first million!
Friday, November 22, 2024
$1M in annual revenue, B2B sales, SOC 2, resellers, grow team, and other updates in November 2024. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Google wants to keep Chrome
Friday, November 22, 2024
The search giant is pushing back on the government's plan to break it up — but competition is coming anyway Platformer Platformer Google wants to keep Chrome The search giant is pushing back on the
SaaSHub Weekly - Nov 21
Thursday, November 21, 2024
SaaSHub Weekly - Nov 21 Featured and useful products Tapzo logo Tapzo Award winning Smart NFC Business Cards #Business Cards #NFC #Sustainability Multiply.cloud logo Multiply.cloud Algorithmic Pricing
🚀 Master Outbound with Chris Marin – Join Us Live! 📬
Thursday, November 21, 2024
[Webinar] Tips to Boost Meetings & Build Sales Pipelines with Email Outreach 📬
[CEI] Chrome Extension Ideas #167
Thursday, November 21, 2024
ideas for Non-Gamblers, Gamers, Twitter, and AI ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
[SaaS Club] How a Tiny Team Bootstrapped a $6M SaaS
Thursday, November 21, 2024
The SaaS Club Newsletter Hey Reader Here's a quick round up of what's been going on at SaaS Club: In this week's newsletter: 🎙️ How Missive grew to $6M ARR with no VC help. 🚀 A smart way to
🗞 What's New: OpenAI's o1 is now available to all paid API users
Thursday, November 21, 2024
Also: How AI is reshaping the global workforce ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Make Your Social Media Work Smarter, Not Harder, With AI 📲
Thursday, November 21, 2024
Keeping up with social media can feel like running on a never-ending treadmill. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
150 days for the rest of your life
Thursday, November 21, 2024
Before we jump in: Every founder knows that chargeback disputes are messy and annoying to deal with. And in some crazy cases, chargebacks can even get your Stripe account suspended 😬 Well, today's