Local Agentic RAG with LangGraph and Llama 3.2
Updated September 25, 2024 with Llama 3.2
LLM agents use planning, memory, and tools to accomplish tasks. Here, we show how to build agents capable of tool-calling using LangGraph with Llama 3.2 and Milvus.
Agents can empower Llama 3.2 with important new capabilities. In particular, we will show how to give Llama 3.2 the ability to perform a web search, call custom user-defined functions
Tool-calling agents with LangGraph use two nodes: an LLM node decides which tool to invoke based on the user input. It outputs the tool name and tool arguments based on the input. The tool name and arguments are passed to a tool node, which calls the tool with the specified arguments and returns the result to the LLM.
Milvus Lite allows you to use Milvus locally without using Docker or Kubernetes. It will store the vectors you generate from the different websites we will navigate to.
Introduction to Agentic RAG
Language models can't take actions themselves—they just output text. Agents are systems that use LLMs as reasoning engines to determine which actions to take and the inputs to pass them. After executing actions, the results can be transmitted back into the LLM to determine whether more actions are needed or if it is okay to finish.
They can be used to perform actions such as Searching the web, browsing your emails, correcting RAG to add self-reflection or self-grading on retrieved documents, and many more.
Setting things up
LangGraph— An extension of Langchain aimed at building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
Ollama & Llama 3.2— With Ollama you can run open-source large language models locally, such as Llama 3.2. This allows you to work with these models on your own terms, without the need for constant internet connectivity or reliance on external servers.
Milvus Lite— Local version of Milvus that can run on your laptop, Jupyter Notebook or Google Colab. Use this vector database we use to store and retrieve your data efficiently.
Using LangGraph and Milvus
We use LangGraph to build a custom local Llama 3.2 powered RAG agent that uses different approaches:
We implement each approach as a control flow in LangGraph:
Routing (Adaptive RAG) - Allows the agent to intelligently route user queries to the most suitable retrieval method based on the question itself. The LLM node analyzes the query, and based on keywords or question structure, it can route it to specific retrieval nodes.
Example 1: Questions requiring factual answers might be routed to a document retrieval node searching a pre-indexed knowledge base (powered by Milvus).
Example 2: Open-ended, creative prompts might be directed to the LLM for generation tasks.
Fallback (Corrective RAG) - Ensures the agent has a backup plan if its initial retrieval methods fail to provide relevant results. Suppose the initial retrieval nodes (e.g., document retrieval from the knowledge base) don't return satisfactory answers (based on relevance score or confidence thresholds). In that case, the agent falls back to a web search node.
- The web search node can utilize external search APIs.
Self-correction (Self-RAG) - Enables the agent to identify and fix its own errors or misleading outputs. The LLM node generates an answer, and then it's routed to another node for evaluation. This evaluation node can use various techniques:
Reflection: The agent can check its answer against the original query to see if it addresses all aspects.
Confidence Score Analysis: The LLM can assign a confidence score to its answer. If the score is below a certain threshold, the answer is routed back to the LLM for revision.
General ideas for Agents
Reflection— The self-correction mechanism is a form of reflection where the LangGraph agent reflects on its retrieval and generations. It loops information back for evaluation and allows the agent to exhibit a form of rudimentary reflection, improving its output quality over time.
Planning— The control flow laid out in the graph is a form of planning, the agent doesn't just react to the query; it lays out a step-by-step process to retrieve or generate the best answer.
Tool use— The LangGraph agent’s control flow incorporates specific nodes for various tools. These can include retrieval nodes for the knowledge base (e.g., Milvus), demonstrating its ability to tap into a vast pool of information, and web search nodes for external information.
Examples of Agents
To showcase the capabilities of our LLM agents, let's look into two key components: the Hallucination Grader
and the Answer Grader
. While the full code is available at the bottom of this post, these snippets will provide a better understanding of how these agents work within the LangChain framework.
Hallucination Grader
The Hallucination Grader tries to fix a common challenge with LLMs: hallucinations, where the model generates answers that sound plausible but lack factual grounding. This agent acts as a fact-checker, assessing if the LLM's answer aligns with a provided set of documents retrieved from Milvus.
### Hallucination Grader
# LLM
llm = ChatOllama(model=local_llm, format="json", temperature=0)
# Prompt
prompt = PromptTemplate(
template="""You are a grader assessing whether
an answer is grounded in / supported by a set of facts. Give a binary score 'yes' or 'no' score to indicate
whether the answer is grounded in / supported by a set of facts. Provide the binary score as a JSON with a
single key 'score' and no preamble or explanation.
Here are the facts:
{documents}
Here is the answer:
{generation}
""",
input_variables=["generation", "documents"],
)
hallucination_grader = prompt | llm | JsonOutputParser()
hallucination_grader.invoke({"documents": docs, "generation": generation})
Answer Grader
Following the Hallucination Grader, another agent steps in. This agent checks another crucial aspect: ensuring the LLM's answer directly addresses the user's original question. It utilizes the same LLM but with a different prompt, specifically designed to evaluate the answer's relevance to the question.
def grade_generation_v_documents_and_question(state):
"""
Determines whether the generation is grounded in the document and answers questions.
Args:
state (dict): The current graph state
Returns:
str: Decision for next node to call
"""
print("---CHECK HALLUCINATIONS---")
question = state["question"]
documents = state["documents"]
generation = state["generation"]
score = hallucination_grader.invoke({"documents": documents, "generation": generation})
grade = score['score']
# Check hallucination
if grade == "yes":
print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
# Check question-answering
print("---GRADE GENERATION vs QUESTION---")
score = answer_grader.invoke({"question": question,"generation": generation})
grade = score['score']
if grade == "yes":
print("---DECISION: GENERATION ADDRESSES QUESTION---")
return "useful"
else:
print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
return "not useful"
else:
pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
return "not supported"
You can see in the code above that we are checking the predictions by the LLM that we use as a classifier.
Compiling the LangGraph graph
This will compile all the agents that we defined and will make it possible to use different tools for your RAG system.
# Compile
app = workflow.compile()
# Test
from pprint import pprint
inputs = {"question": "What is prompt engineering?"}
for output in app.stream(inputs):
for key, value in output.items():
pprint(f"Finished running: {key}:")
pprint(value["generation"])
'Finished running: generate:'
('Prompt engineering is the process of communicating with Large Language '
'Models (LLMs) to steer their behavior towards desired outcomes without '
'updating the model weights. It focuses on alignment and model steerability, '
'requiring experimentation and heuristics due to varying effects among '
'models. The goal is to improve controllable text generation by optimizing '
'prompts for specific applications.')
Conclusion
In this blog post, we showed how to build a RAG system using agents with LangChain/ LangGraph, Llama 3.2, and Milvus. These agents make it possible for LLMs to have planning, memory, and different tool use capabilities, which can lead to more robust and informative responses.
Feel free to check out the code available on Github.
If you enjoyed this blog post, consider giving us a star on Github, and share your experiences with the community by joining our Discord.
This is inspired by the Github Repository from Meta with recipes for using Llama 3.2
- Introduction to Agentic RAG
- Setting things up
- Using LangGraph and Milvus
- Examples of Agents
- Compiling the LangGraph graph
- Conclusion
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free