Learn
The Definitive Guide to Building RAG Apps with LlamaIndex

Persistent Vector Storage for LlamaIndex

Jun 27, 20235 min read

Discover the benefits of Persistent Vector Storage and optimize your data storage strategy.

Read the entire series

This article was originally published in Toward AI and is reposted here with permission.

Unless you’ve been living under a rock, you know what ChatGPT is. ChatGPT is powered by GPT, a large language model (LLM) created by OpenAI. The popularity of ChatGPT has garnered mass interest in LLMs and how they can be leveraged. LlamaIndex is a powerful new tool for building LLM applications.

The three most significant challenges that come up when it comes to building LLM applications are:

The massive cost
The lack of up-to-date information
The need for domain-specific knowledge

Two main frameworks are proposed for dealing with these problems: fine-tuning and caching + injection.

Fine-tuning is a good solution for the last two challenges around the lack of correct information. However, when handling costs, fine-tuning is the opposite of helpful. That’s why we use caching + injection. One framework for this has been dubbed “CVP” or ChatGPT + Vector Database + Prompt-as-Code. LlamaIndex can abstract much of this framework for you.

In this piece, we’ll cover:

An introduction to LlamaIndex
Creating and saving your LlamaIndex vector index
- Using a vector database locally
- Using a cloud vector database
A summary of how to use a persistent vector store with LlamaIndex

Introduction to LlamaIndex

“[You can think of LlamaIndex as a] black box around your Data and an LLM” - Jerry Liu, Co-Founder of LlamaIndex

LlamaIndex facilitates interactions between you, your data, and an LLM such as ChatGPT. First, it provides a way to “index” your data into multiple “nodes.” Then it uses the indexed data to interact with ChatGPT. There are four main indexing patterns in LlamaIndex: a list index, a vector store index, a tree index, and a keyword index.

Each of the indexes has its advantages and use cases. For example, a list index is suitable for interacting with a large document in cases where you need the whole picture. The vector index is perfect for anything requiring semantic similarity. The tree index is good for when you need sparse information. Lastly, a keyword index is good for looking for specific keywords.

The indexes can be stored and loaded for session management. By default, we can store the index context locally. However, you want a persistent storage engine for your index for real-world use cases. In this tutorial, we look at how to create a persistent vector index.

Creating and saving your LlamaIndex vector index

You'll need to get some data to follow along with the code in this tutorial. For this tutorial, I use the data directly from the examples folder in the LlamaIndex repository. You can clone the repository locally and create a notebook in the paul_graham_essay folder or download the data from that folder and use the code locally in your folder.

Using a vector database locally

For this example, we use the open-source vector database Milvus. Specifically, Milvus Lite so we can run this directly in a notebook without any extra work. We only need to pip install it. Before running the code below, pip install Milvus llama-index python-dotenv. You also need an OpenAI API key to work with GPT. The python-dotenv library is only needed if you store your OpenAI API key in a .env file.

The imports we need from llama_index are GPTVectorStoreIndex, StorageContext, and MilvusVectorStore from the vector_stores module. We only need one import from Milvus, the default_server. I also import os and load_dotenv to load my API key.

Once the imports are done, and the API key is loaded, we spin up our vector database. Then, we can call start() on default_server to start up a Milvus Lite instance locally. Then, we can connect to the vector store using MilvusVectorStore and pass in the appropriate host and port.

from llama_index import GPTVectorStoreIndex, StorageContext
from llama_index.vector_stores import MilvusVectorStore
from milvus import default_server
from dotenv import load_dotenv
import os
load_dotenv()
open_api_key = os.getenv("OPENAI_API_KEY")


default_server.start()
vector_store = MilvusVectorStore(
   host = "127.0.0.1",
   port = default_server.listen_port
)

With our vector database connected, we configure a storage context. The storage context tells LlamaIndex where to store the index. Now that everything is configured, we create an index with GPTVectorStoreIndex. We pass both the documents to create the index from and the storage context.

From here, we can query the index as normal. For this example, we query the vector index with “What did the author do growing up?” This question requires a vector index because it requires the semantic abstractions of “the author” and “growing up.” For example, we should see a response like “Growing up, the author wrote short stories, programmed on an IBM 1401, and nagged his father to buy him a TRS-80 microcomputer. …”

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(
   documents, storage_context=storage_context
)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

Using a cloud vector database

If you have enough data, it makes more sense to use cloud storage than local storage to use a cloud vector database for your LlamaIndex vector store index. To use the cloud version of Milvus, Zilliz, you only need to do two things differently. First, get a Zilliz account (which comes with $100 of free credit) and create a collection in your account. Then make the following code changes.

Instead of writing:

vector_store = MilvusVectorStore(
   host = "127.0.0.1",
   port = default_server.listen_port
)

Use

vector_store = MilvusVectorStore(
   host = HOST
   port = PORT,
   user = USER,
   password = PASSWORD,
   use_secure = True,
   overwrite = True
)

Where HOST, PORT, USER, and PASSWORD correspond to the host, port, username, and password set in your Zilliz account.

Summary of how to use a persistent vector store with LlamaIndex

In this tutorial, we briefly looked at LlamaIndex, a framework for interacting with your data, and an LLM. Then, we created an example vector store index in LlamaIndex and covered two ways to persist your vector store. First, we looked at creating a persistent vector store using Milvus Lite. Then, we looked at how to use a cloud vector database using Zilliz.

Updated on Mar 28, 2025

Yujian Tang
Yujian Tang is a Developer Advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. Yujian studied Computer Science, Statistics, and Neuroscience with research papers published to conferences including IEEE Big Data. He enjoys drinking bubble tea, spending time with family, and being near water.

Next: Query Multiple Documents Using LlamaIndex, LangChain, and Milvus

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

Getting Started with LlamaIndex

Learn about LlamaIndex, a flexible data framework connecting private, customized data sources to your large language models (LLMs).

Query Multiple Documents Using LlamaIndex, LangChain, and Milvus

Unlock the power of LLM: Combine and query multiple documents with LlamaIndex, LangChain, and Milvus Vector Database

Chat with Towards Data Science Using LlamaIndex

In this second post of the four-part Chat Towards Data Science blog series, we show why LlamaIndex is the leading open source data retrieval framework.