͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

📝 Guest Post: Local Agentic RAG with LangGraph and Llama 3*

Jul 22

READ IN APP

In this guest post, Stephen Batifol from Zilliz discusses how to build agents capable of tool-calling using LangGraph with Llama 3 and Milvus. Let’s dive in.

LLM agents use planning, memory, and tools to accomplish tasks. Here, we show how to build agents capable of tool-calling using LangGraph with Llama 3 and Milvus.

Agents can empower Llama 3 with important new capabilities. In particular, we will show how to give Llama 3 the ability to perform a web search, call custom user-defined functions

Tool-calling agents with LangGraph use two nodes: an LLM node decides which tool to invoke based on the user input. It outputs the tool name and tool arguments based on the input. The tool name and arguments are passed to a tool node, which calls the tool with the specified arguments and returns the result to the LLM.

Milvus Lite allows you to use Milvus locally without using Docker or Kubernetes. It will store the vectors you generate from the different websites we will navigate to.

Introduction to Agentic RAG

Language models can't take actions themselves—they just output text. Agents are systems that use LLMs as reasoning engines to determine which actions to take and the inputs to pass them. After executing actions, the results can be transmitted back into the LLM to determine whether more actions are needed or if it is okay to finish.

They can be used to perform actions such as Searching the web, browsing your emails, correcting RAG to add self-reflection or self-grading on retrieved documents, and many more.

Setting things up

LangGraph – An extension of Langchain aimed at building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
Ollama & Llama 3 – With Ollama you can run open-source large language models locally, such as Llama 3. This allows you to work with these models on your own terms, without the need for constant internet connectivity or reliance on external servers.
Milvus Lite – Local version of Milvus that can run on your laptop, Jupyter Notebook or Google Colab. Use this vector database we use to store and retrieve your data efficiently.

Using LangGraph and Milvus

We use LangGraph to build a custom local Llama 3-powered RAG agent that uses different approaches:

We implement each approach as a control flow in LangGraph:

Routing (Adaptive RAG) - Allows the agent to intelligently route user queries to the most suitable retrieval method based on the question itself.
The LLM node analyzes the query, and based on keywords or question structure, it can route it to specific retrieval nodes.
- Example 1: Questions requiring factual answers might be routed to a document retrieval node searching a pre-indexed knowledge base (powered by Milvus).
- Example 2: Open-ended, creative prompts might be directed to the LLM for generation tasks.
Fallback (Corrective RAG) - Ensures the agent has a backup plan if its initial retrieval methods fail to provide relevant results.
Suppose the initial retrieval nodes (e.g., document retrieval from the knowledge base) don't return satisfactory answers (based on relevance score or confidence thresholds). In that case, the agent falls back to a web search node.
- The web search node can utilize external search APIs.
Self-correction (Self-RAG) - Enables the agent to identify and fix its own errors or misleading outputs.
The LLM node generates an answer, and then it's routed to another node for evaluation. This evaluation node can use various techniques:
- Reflection: The agent can check its answer against the original query to see if it addresses all aspects.
- Confidence Score Analysis: The LLM can assign a confidence score to its answer. If the score is below a certain threshold, the answer is routed back to the LLM for revision.

General ideas for Agents

Reflection – The self-correction mechanism is a form of reflection where the LangGraph agent reflects on its retrieval and generations. It loops information back for evaluation and allows the agent to exhibit a form of rudimentary reflection, improving its output quality over time.
Planning – The control flow laid out in the graph is a form of planning, the agent doesn't just react to the query; it lays out a step-by-step process to retrieve or generate the best answer.
Tool use – The LangGraph agent’s control flow incorporates specific nodes for various tools. These can include retrieval nodes for the knowledge base (e.g., Milvus), demonstrating its ability to tap into a vast pool of information, and web search nodes for external information.

Examples of Agents

To showcase the capabilities of our LLM agents, let's look into two key components: the Hallucination Grader and the Answer Grader. While the full code is available at the bottom of this post, these snippets will provide a better understanding of how these agents work within the LangChain framework.

Hallucination Grader

The Hallucination Grader tries to fix a common challenge with LLMs: hallucinations, where the model generates answers that sound plausible but lack factual grounding. This agent acts as a fact-checker, assessing if the LLM's answer aligns with a provided set of documents retrieved from Milvus.

```

### Hallucination Grader

# LLM

llm = ChatOllama(model=local_llm, format="json", temperature=0)

# Prompt

prompt = PromptTemplate(

template="""You are a grader assessing whether

an answer is grounded in / supported by a set of facts. Give a binary score 'yes' or 'no' score to indicate

whether the answer is grounded in / supported by a set of facts. Provide the binary score as a JSON with a

single key 'score' and no preamble or explanation.

Here are the facts:

{documents}

Here is the answer:

{generation}

""",

input_variables=["generation", "documents"],

)

hallucination_grader = prompt | llm | JsonOutputParser()

hallucination_grader.invoke({"documents": docs, "generation": generation})

```

Answer Grader

Following the Hallucination Grader, another agent steps in. This agent checks another crucial aspect: ensuring the LLM's answer directly addresses the user's original question. It utilizes the same LLM but with a different prompt, specifically designed to evaluate the answer's relevance to the question.

```

def grade_generation_v_documents_and_question(state):

"""

Determines whether the generation is grounded in the document and answers questions.

Args:

state (dict): The current graph state

Returns:

str: Decision for next node to call

"""

print("---CHECK HALLUCINATIONS---")

question = state["question"]

documents = state["documents"]

generation = state["generation"]

score = hallucination_grader.invoke({"documents": documents, "generation": generation})

grade = score['score']

# Check hallucination

if grade == "yes":

print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")

# Check question-answering

print("---GRADE GENERATION vs QUESTION---")

score = answer_grader.invoke({"question": question,"generation": generation})

grade = score['score']

if grade == "yes":

print("---DECISION: GENERATION ADDRESSES QUESTION---")

return "useful"

else:

print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")

return "not useful"

else:

pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")

return "not supported"

```

You can see in the code above that we are checking the predictions by the LLM that we use as a classifier.

Compiling the LangGraph graph.

This will compile all the agents that we defined and will make it possible to use different tools for your RAG system.

```

# Compile

app = workflow.compile()

# Test

from pprint import pprint

inputs = {"question": "Who are the Bears expected to draft first in the NFL draft?"}

for output in app.stream(inputs):

for key, value in output.items():

pprint(f"Finished running: {key}:")

pprint(value["generation"])

```

Conclusion

In this blog post, we showed how to build a RAG system using agents with LangChain/ LangGraph, Llama 3, and Milvus. These agents make it possible for LLMs to have planning, memory, and different tool use capabilities, which can lead to more robust and informative responses.

Feel free to check out the code available in the Milvus Bootcamp repository.

If you enjoyed this blog post, consider giving us a star on Github, and share your experiences with the community by joining our Discord.

This is inspired by the Github Repository from Meta with recipes for using Llama 3

*This post was written by Stephen Batifol and originally published on Zilliz.com here. We thank Zilliz for their insights and ongoing support of TheSequence.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Restack

One Week, 7 Major Foundation Model Releases

Sunday, July 21, 2024

Apple, HuggingFace, OpenAI, Mistral, Groq all released innovative models in the same week. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

📽 [Virtual Talk] Supercharge Production AI with Features as Code

Friday, July 19, 2024

Data is essential for AI/ML systems but often becomes a development bottleneck. Data scientists and engineers face challenges in building and maintaining feature pipelines, ensuring data consistency

Edge 414: Inside Meta AI's HUSKY: A New Agent Optimized for Multi-Step Reasoning

Thursday, July 18, 2024

New research from Meta AI, Allen AI, and the University of Washington tackles one of the most important problems in LLM reasoning. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Edge 413: Autonomous Agents and Semantic Memory

Tuesday, July 16, 2024

Can agents capture memory that encodes actual knowledge? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

📽 [Virtual Talk] Building a Resilient, Real-Time Fraud System at Block

Monday, July 15, 2024

Data is crucial for AI/ML systems but often becomes a bottleneck in development. Data scientists and engineers grapple with the complexity of building and maintaining feature pipelines, ensuring

Re: You're Invited: Free Photo Management Class

Tuesday, March 11, 2025

This is your last chance to register for tomorrow's live online Photo Management Class, Wednesday, March 12, at 4:30 pm ET! Sign up now to attend the FREE Photo Management Class The recent changes

BetterDev #275 - Tracking You from a Thousand Miles Away! Turning a Bluetooth Device into an Apple AirTag Without Root Privileges

Monday, March 10, 2025

Better Dev #275 Mar 10, 2025 Hi all, In the wave of ByBit exchange being hack for 1.6billion, and the hack is very sophisciated, exploit developer access key to change an s3 bucket. The attack start by

📝 Guest Post: Local Agentic RAG with LangGraph and Llama 3*

📝 Guest Post: Local Agentic RAG with LangGraph and Llama 3*

Introduction to Agentic RAG

Setting things up

Using LangGraph and Milvus

General ideas for Agents

Examples of Agents

Hallucination Grader

Answer Grader

Compiling the LangGraph graph.

Conclusion

*This post was written by Stephen Batifol and originally published on Zilliz.com here. We thank Zilliz for their insights and ongoing support of TheSequence.

Older messages

One Week, 7 Major Foundation Model Releases

📽 [Virtual Talk] Supercharge Production AI with Features as Code

Edge 414: Inside Meta AI's HUSKY: A New Agent Optimized for Multi-Step Reasoning

Edge 413: Autonomous Agents and Semantic Memory

📽 [Virtual Talk] Building a Resilient, Real-Time Fraud System at Block

You Might Also Like

Re: You're Invited: Free Photo Management Class

BetterDev #275 - Tracking You from a Thousand Miles Away! Turning a Bluetooth Device into an Apple AirTag Without Root Privileges

What's the goal of the goal & Tapbots is working on a Bluesky client

Ranked: | The World's Most Popular Programming Languages 🖥️

GCP Newsletter #441

⚡ THN Weekly Recap: New Attacks, Old Tricks, Bigger Impact

Beware AI voice cloning tools 🤖

⚙️ Google's AI plans

Post from Syncfusion Blogs on 03/10/2025

😎 10 Weirdest Android Phones Ever — Why I Prefer Bixby to Google Assistant