📝 Guest Post: Stop Hallucinations From Hurting your LLM Powered Apps*
Was this email forwarded to you? Sign up here Large language model (LLM) hallucinations pose a big threat to the successful adoption of the new wave of LLM apps. In this post, the Galileo team dives into how one can prevent hallucinations from creeping in, as well as some metrics developed by the researchers at Galileo to quantify potential LLM hallucinations. They also introduce a free access to the Galileo LLM Studio, powered by research-backed mechanisms to combat LLM hallucinations. Leveraging large language models to build useful and interactive products has never been easier. This has sparked a new wave of ‘LLM developers’ to quickly use a combination of LLMs (Open source/Closed source), frameworks such as LangChain, LlamaIndex and LangFlow, and perhaps a vector database to spin up a LLM powered product — a large number of startups, mid-market businesses, as well as large enterprises have been racing in this direction. Given this, it becomes increasingly important to be mindful of a huge bottleneck in LLM adoption – hallucinations, aka models being over confidently incorrect in their predictions. We have already seen this crop up in Google’s Bard launch, Microsoft’s launch of the Tey chatbot a few years ago, and numerous similar examples recently – this poses not just a reputational risk to the business, but also a societal risk with disinformation dissemination and biased viewpoints proliferating. What are LLM hallucinations?LLMs hallucinate when their predictions are based on insufficient or inaccurate training data. For instance, GPT-4 was trained on data dating back to Sep 2021. When prompted with questions beyond that date, it has a high likelihood of a hallucinated response. A research paper from the Center for Artificial Intelligence Research (CAiRE) defines a hallucination from an LLM as “when the generated content is nonsensical or unfaithful to the provided source content”. A popular example of this is about GPT-4 being asked “Who owns Silicon Valley Bank?”. The response is “Silicon Valley Bank (SVB) is not owned by a single individual or entity. It is a subsidiary of SVB Financial Group, a publicly traded company (NASDAQ: SIVB). The ownership of SVB Financial Group is distributed among individual and institutional shareholders who own its stocks. As a publicly traded company, its ownership can change frequently as shares are bought and sold on the stock market.” In this case, GPT-4 has no idea about the recent SVB collapse. To mitigate disinformation from this ‘hallucinated’ response, OpenAI recently added the ‘As of my knowledge cutoff in September 2021,’ prefix ahead of such responses. Why LLM hallucinations occurLLMs are, at the end of the day, large neural networks that predict the next token in a sequence – this could be the next character, sub-word or word. In mathematical terms – given a sequence of tokens T1, T2, …, TN, , the LLM learns the probability distribution of the next token TN+1 conditioned on the previous tokens: P(T_{N+1}|T_{1},T_{2},…,T_{N}) There are two factors that can influence LLM hallucination a lot:
Quantifying LLM HallucinationsThe best ways to reduce LLM hallucinations are by
To take this a step further, the researchers at Galileo have come up with promising metrics to be used to quantify hallucination.
Introducing the Galileo LLM StudioTo build high performing LLM powered apps, requires careful debugging of prompts and the training data – the Galileo LLM Studio provides powerful tools to do just that, powered by research-backed mechanisms to combat LLM hallucinations – and it’s 100% free for the community to use.
ConclusionIf you are interested to try the Galileo LLM Studio – join the waitlist along with 1000s of developers building exciting LLM powered apps. The problem of model hallucinations poses a dire threat in the face of adopting LLMs in applications at scale for everyday use – by focusing on ways to quantify the problem, as well as baking in safeguards, we can build safer, more useful products for the world and truly unleash the power of LLMs. References & AcknowledgmentsThe calibration and building blocks of Galileo's LLM hallucination metric is the outcome of numerous techniques and experiments, with references to (but not limited by) the following papers and artifacts:
*This post was written by the Galileo team. We thank Galileo for their support of TheSequence.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Key phrases
Older messages
Edge 296: Inside OpenAI's Method to Use GPT-4 to Explain Neuron's Behaviors in GPT-2
Thursday, June 1, 2023
The technique is one of the first attempts to utilize LLMs as a explainability foundation.
The Sequence Chat: Rohan Taori on Stanford's Alpaca, Alpaca Farm and the Future of LLMs
Wednesday, May 31, 2023
Alpaca was one of the first open LLMs to incorporate instruction following capabilities. Now one of the project's main researchers shares his insights about modern LLMs.
Edge 295: Self-Instruct Models
Tuesday, May 30, 2023
What if LLMs could auto improve their own instruction following capabilities?
📝 Guest Post: How to build a responsible code LLM with crowdsourcing*
Monday, May 29, 2023
In this post Toloka showcases Human-in-the-Loop using StarCoder, a code LLM, as an example. They address PII risks by training a PII reduction model through crowdsourcing, employing strategies like
Sunday, May 28, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
Monday, September 25, 2023
TechCrunch Newsletter TechCrunch logo Max Q logo By Aria Alamalhodaei Monday, September 25, 2023 Hello and welcome back to Max Q! I'm finally home after attending TechCrunch Disrupt, our flagship
Daily Coding Problem: Problem #1223 [Hard]
Monday, September 25, 2023
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by LinkedIn. You are given a binary tree in a peculiar string representation. Each node is
Visualized | Which Companies Own the Most Satellites? 🛰️
Monday, September 25, 2023
Despite Starlink's dominance in the satellite industry, the company is set to face intense competition in the coming years. View Online | Subscribe Presented by: How close is the US electricity
Wrapping up in SF, managing your data and more... | September 25
Monday, September 25, 2023
What's happening at TechCrunch this week TechCrunch events roundup Things are slowly returning to "normal" here in TechCrunch-land after an explosive TechCrunch Disrupt last week in San
Noonification: Model Quantization in Deep Neural Networks
Monday, September 25, 2023
Top Tech Content sent at Noon! 15k+ Startups Scaled Their Data Infrastructure with Segment. Apply Now! How are you, @hacker? 🪐 What's happening in tech this week: The Noonification by HackerNoon
The collapse of Microsoft Surface
Monday, September 25, 2023
How to fix IT burnout; Fedora 39 is lightning fast; FaceTime voicemails -- ZDNET ZDNET Tech Today - US September 25, 2023 placeholder Can Microsoft recover from the collapse of its Surface business?
Digest #117: Pre-Commit Hooks for Terraform 🤖
Monday, September 25, 2023
Digest #117: Pre-Commit Hooks for Terraform 🤖 #117: Pre-Commit Hooks for Terraform Hey there, DevOps enthusiasts! 👋🏻 We're back, and oh boy, do we have some juicy bits for you this week in the
Monday, September 25, 2023
Amazon is stepping up its rivalry with Microsoft, Meta, Google, and more by agreeing to potentially invest $4 billion in... Product Hunt Read in browser AMAZON HAS ENTERED THE CHAT... Amazon is
From Watering Hole to Spyware: EvilBamboo Targets Tibetans, Uyghurs, and Taiwanese
Monday, September 25, 2023
The Hacker News Daily Updates Newsletter cover Safeguarding Servers We know servers to be an attacker's ultimate target, but while they do store or process large amounts of sensitive data, the
Tinder’s $500 a month tier is here
Monday, September 25, 2023
The Morning After It's Monday, September 25, 2023. Hey big spender. Tinder Select, the dating app's most exclusive tier, is rolling out now. It will cost love seekers $500 per month (or $6000