Kickstart Your Local RAG Setup: A Beginner's Guide to Using Llama 3 with Ollama, Milvus, and LangChain
With the rise of Open-Source LLMs like Llama, Mistral, Gemma, and more, it has become apparent that LLMs might also be useful even when run locally. This approach is not only practical but also becomes essential, as costs can skyrocket when scaling up to commercial LLMs like GPT-3 or GPT-4.
In this hands-on guide, we will see how to deploy a Retrieval Augmented Generation (RAG) setup using Ollama and Llama 3, powered by Milvus as the vector database.
The different tools:
- Ollama: Brings the power of LLMs to your laptop, simplifying local operation.
- LangChain is what we use to create an agent and interact with our Data.
- Milvus is the vector database we use to store and retrieve your data efficiently.
- Llama 3 is Meta’s latest iteration of a lineup of large language models.
Q&A with RAG
We will build a sophisticated question-answering (Q&A) chatbot using RAG (Retrieval Augmented Generation). This will allow us to answer questions about specific information.
What exactly is RAG?
RAG, or Retrieval Augmented Generation, is a technique that enhances LLMs by integrating additional data sources. A typical RAG application involves:
- Indexing - a pipeline for ingesting data from a source and indexing it, which usually consists of Loading, Splitting and Storing the data in Milvus.
- Retrieval and generation - At runtime, RAG processes the user's query, fetches relevant data from the index stored in Milvus, and the LLM generates a response based on this enriched context.
This guide is designed to be practical and hands-on, showing you how local LLMs can be used to set up a RAG application. It's not just for experts-even beginners can dive in and start building their own Q&A chatbot. Let's get started!
Prerequisites
Before starting to set up the different components of our tutorial, make sure your system has the following:
- Docker & Docker-Compose - Ensure Docker and Docker-Compose are installed on your system.
- Milvus Standalone - For our purposes, we'll use Milvus Standalone, which is easy to manage via Docker Compose; check out how to install it in our documentation
- Ollama - Install Ollama on your system; visit their website for the latest installation guide.
##Langchain Setup
Once you've installed all the prerequisites, you're ready to set up your RAG application:
- Start a Milvus Standalone instance with:
docker-compose up -d.
- This command starts your Milvus instance in detached mode, running quietly in the background.
- Fetch an LLM model via:
ollama pull <name_of_model>
- View the list of available models via their library
- e.g.
ollama pull llama3
- This command downloads the default (usually the latest and smallest) version of the model.
- To chat directly with a model from the command line, use
ollama run <name-of-model>
Install dependencies
To run this application, you need to install the needed libraries. You can either use Poetry if you use the code on Github directly or install them with pip
if you prefer.
pip install langchain pymilvus ollama pypdf langchainhub langchain-community langchain-experimental
RAG Application
As said earlier, one main component of RAG is indexing the data.
- Start by important the data from your PDF using PyPDFLoader
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader(
"https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf"
)
data = loader.load()
- Splitting the data
Break down the loaded data into manageable chunks using the RecursiveCharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
- Getting the Embeddings and storing the data in Milvus
Next, convert the text data into vector embeddings using Jina AI’s Small English embeddings, and store it into Milvus.
from langchain_community.embeddings.jina import JinaEmbeddings
from langchain.vectorstores.milvus import Milvus
embeddings = JinaEmbeddings(
jina_api_key=JINA_AI_API_KEY, model_name="jina-embeddings-v2-small-en"
)
vector_store = Milvus.from_documents(documents=all_splits, embedding=embeddings)
- Load your LLM
Ollama makes it easy to load and use an LLM locally. In our example, we will use Llama 3 by Meta, here is how to load it:
from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = Ollama(
model="llama3",
callback_manager=CallbackManager(
[StreamingStdOutCallbackHandler()]
),
stop=["<|eot_id|>"],
)
- Build your QA chain with Langchain
Finally, construct your QA chain to process and respond to user queries:
from langchain import hub
from langchain.chains import RetrievalQA
query = input("\nQuery: ")
prompt = hub.pull("rlm/rag-prompt")
qa_chain = RetrievalQA.from_chain_type(
llm, retriever=vectorstore.as_retriever(), chain_type_kwargs={"prompt": prompt}
)
result = qa_chain({"query": query})
print(result)
Run your application
Execute your RAG application by the last cell with the result variable.
Execute your RAG application by running:
python rag_ollama.py
Example of a QA interaction:
Query: What is this document about?
The document appears to be a 104 Cover Page Interactive Data File for an SEC filing. It contains information about the company's financial statements and certifications.{'query': 'What is this document about?', 'result': "The document appears to be a 104 Cover Page Interactive Data File for an SEC filing. It contains information about the company's financial statements and certifications."}
And there you have it! You've just set up a sophisticated local LLM using Ollama with Llama 3, Langchain, and Milvus. This setup not only makes it feasible to handle large datasets efficiently but also enables a highly responsive local question-answering system.
Feel free to check out Milvus, the code on Github, and share your experiences with the community by joining our Discord.
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free