Learn
The Definitive Guide to Building RAG Apps with LlamaIndex

Chat with Towards Data Science Using LlamaIndex

Oct 12, 20236 min read

In this second post of the four-part Chat Towards Data Science blog series, we show why LlamaIndex is the leading open source data retrieval framework.

By Yujian Tang

Read the entire series

This article is the second post of the four-part Chat Towards Data Science blog series. See the first part for how to build a chatbot with Zilliz Cloud.

LlamaIndex is the leading open source data retrieval framework, and you’re about to see why. Chatbots with access to your internal knowledge store are a popular use case for many businesses. The more documents you have, the harder it becomes to manage. When it comes to making these chatbots, you have three major concerns: how to chunk the data, what metadata to store, and how to route your queries.

The Chat TDS project so far

This tutorial builds on our last tutorial on how to build a chatbot for Towards Data Science. In the last post, we made the simplest possible retrieval augmented generation chatbot using just Milvus/Zilliz. I put it up on Zilliz Cloud, the fully managed Milvus, so we could have access to the same vector database across multiple projects. You can use the free tier of Zilliz Cloud for this project, or use your own instance of Milvus, which you can spin up in the notebook using Milvus Lite.

Last time we examined our splits and saw that we had many small chunks of text. When we tried simple retrieval for the question “what is a large language model?”, the chunks we got back were semantically similar, but didn’t answer the question. In this project, we see how we can use the same vector database as the backend, but use a different retrieval process to get much better results. In this case, we use LlamaIndex to orchestrate that effective retrieval.

Why use LlamaIndex for a chatbot?

LlamaIndex is a framework for working with your data on top of a large language model. One of the major abstractions that LlamaIndex offers is the “index” abstraction. An index is a model of how the data is distributed. On top of that, LlamaIndex also offers the ability to turn these indexes into query engines. The query engine leverages a large language model and an embedding model to orchestrate effective queries and retrieve relevant results.

LlamaIndex and Milvus for Chat Towards Data Science

So, LlamaIndex helps us with orchestrating our data retrieval, how does Milvus help? Milvus is what we use as our backend for persistent vector storage with LlamaIndex. Using a Milvus or Zilliz instance allows us to port our vectors from one framework to another. In this case, we’re moving from a Python native, no orchestration app to a retrieval application powered by LlamaIndex.

Setting up your notebook to use Zilliz and LlamaIndex

As we covered earlier, for this set of projects (Chat with Towards Data Science), we use Zilliz for easier portability. Connecting to Zilliz Cloud and connecting to Milvus is almost the exact same. For an example of how to connect to Milvus and use Milvus as your local vector store, see this example on comparing vector embeddings.

For this notebook, we need to install three libraries, which we can do through pip install llama-index python-dotenv openai. I like to use python-dotenv for environment variable management.

After we get our imports, we need to load our .env file with load_dotenv(). The three environment variables we need for this project are our OpenAI API key, the URI for our Zilliz Cloud cluster, and the token for our Zilliz Cloud cluster.

! pip install llama-index python-dotenv openai
import os
from dotenv import load_dotenv
import openai


load_dotenv()


openai.api_key = os.getenv("OPENAI_API_KEY")


zilliz_uri = os.getenv("ZILLIZ_URI")
zilliz_token = os.getenv("ZILLIZ_TOKEN")

Bring your own existing collection to LlamaIndex

For this project, we are bringing an existing collection to LlamaIndex. This poses a unique challenge. LlamaIndex has its own internal structure for creating and accessing a vector database collection. However, this time, we aren’t using LlamaIndex to build our collection directly.

The primary differences between the native LlamaIndex vector store interface and bringing your own models are the embeddings and the way the metadata is accessed. I actually wrote a LlamaIndex contribution to enable this project!

LlamaIndex uses OpenAI embeddings by default, but we used a HuggingFace model to generate our embeddings. This means we have to pass the right embedding model. In addition, we use a different field to store our text. We use “paragraph” while LlamaIndex uses “_node_content” by default.

We need four imports from LlamaIndex for this section. First, we need MilvusVectorStore to use Milvus with LlamaIndex. We also need the VectorStoreIndex module to use Milvus as a vector store index, and the service context module to pass the services we want to use around. The last import we need here is the HuggingFaceEmbedding module so we can use an open source embedding model from Hugging Face.

Let’s start by getting our embedding model. All we have to do is declare a HuggingFaceEmbedding object and pass in our model name. In this case, the MiniLM L12 model. Next, we create a service context object so we can pass this embedding model around.

from llama_index.vector_stores import MilvusVectorStore
from llama_index import VectorStoreIndex, ServiceContext
from llama_index.embeddings import HuggingFaceEmbedding


embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L12-v2")
service_context = ServiceContext.from_defaults(embed_model=embed_model)

We also need to connect to our Milvus vector store. For this project, we pass five parameters: the URI for our collection, the token to access our collection, the collection name we used (“Llamalection” by default), the similarity metric we used, and the key that corresponds to which metadata field stores the text.

vdb = MilvusVectorStore(
    uri = zilliz_uri,
    token = zilliz_token,
    collection_name = "tds_articles",
    similarity_metric = "L2",
    text_key="paragraph"
)

Querying your existing Milvus collection using LlamaIndex

Now that we’ve connected our existing Milvus collection and pulled the model we need, let’s check out our query. First, we create a storage context object to pass our Milvus vector database around. Then, we turn our Milvus collection into a vector store index. This is also where we pass our embedding model using the service context object we made above.

With a vector store index object initialized, we simply call the as_query_engine() function to turn it into a query engine. For this example, we compare the difference between a straightforward semantic search and the usage of a LlamaIndex query engine using by using the same question as before - “What is a large language model?”

vector_index = VectorStoreIndex.from_vector_store(vector_store=vdb, service_context=service_context)
query_engine = vector_index.as_query_engine()
response = query_engine.query("What is a large language model?")

To make the output look nice, I imported pprint and used it to print the response.

from pprint import pprint
pprint(response)

This is the response that we get using LlamaIndex to do our retrieval. This is a much better result than simple semantic search!

llamaindex-response.png

Summary of Chat Towards Data Science with LlamaIndex

In this version of our chat Towards Data Science project, we used LlamaIndex along with our existing Milvus collection to improve upon our first version. The first version used simple semantic similarity through vector search to find answers. Those results were not very good. Using LlamaIndex to implement a query engine gives us much better results.

The biggest challenge for this project was bringing our own existing Milvus collection. Our existing collection did not use the default values for the embedding vector dimension nor for the metadata field used to store text. These two differences can be handled through passing a specific embedding model through a service context and defining the correct text field when creating your Milvus Vector Store object.

Once we created our vector store object, we turned it into an index with the Hugging Face embeddings we used. Then, we turned that index into a query engine. The query engine leverages an LLM to make sense of the question as well as collect the responses and construct a better response.

To read the entire Chat Towards Data Science series again, here are the links:

Part I: Chat Towards Data Science: Building a Chatbot with Zilliz Cloud
Part II: Chat with Towards Data Science Using LlamaIndex
Part III: Grounding Our Chat Towards Data Science Results

Updated on Jul 19, 2024

Yujian Tang
Yujian Tang is a Developer Advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. Yujian studied Computer Science, Statistics, and Neuroscience with research papers published to conferences including IEEE Big Data. He enjoys drinking bubble tea, spending time with family, and being near water.