Blog
Getting Started with a Milvus Connection

Getting Started with a Milvus Connection

Nov 24, 20233 min read

Milvus is an open-source vector database for building AI applications with unstructured data embeddings. It comes with everything you need to get started built right in, and runs on your local machine or hosted in Zilliz Cloud (see notebook connect to Zilliz Free Tier).

Milvus has four SDKs: Java, Python, React, and Ruby. Below, we’ll show steps for Python.

Install and start a Milvus server

pip install milvus pymilvus #pymilvus is the python sdk

from milvus import default_server
default_server.start()

Note: If you are connecting to Zilliz, you can skip installing Milvus and start the cluster from the console instead.

Get the Milvus client (connection)

from pymilvus import connections

connections.connect(
   host=’127.0.0.1’,
   port=default_server.listen_port)

Note: To connect to serverless Milvus running on Zilliz Cloud instead of localhost, you need to get the endpoint uri and token from the console.

from pymilvus import connections

ENDPOINT=”https://endpoint.api.region.zillizcloud.com:443”
connections.connect(
   uri=ENDPOINT,
   token=TOKEN)

Create a collection

You can think of a collection like a database table. It is where you’ll store your embeddings, documents, and any additional metadata.

Associated with a collection is a schema and index.

The index is built using a vector search algorithm and vector similarity metric. You can use the Milvus defaults. However, for optimal performance, you should configure the parameters. The choice of search index and parameters depends on your data. See this best practice guide for choosing a search index. This notebook shows how to modify the parameters.

from pymilvus import (
   FieldSchema, DataType, 
   CollectionSchema, Collection)

## 1. Define a minimum expandable schema.
fields = [
   FieldSchema(“pk”, DataType.INT64, is_primary=True, auto_id=True),
   FieldSchema(“vector”, DataType.FLOAT_VECTOR, dim=768),
]
schema = CollectionSchema(
   fields,
   enable_dynamic_field=True,
)

## 2. Create a collection.
mc = Collection(“my_collection_name”, schema)

## 3. Index the collection.
mc.create_index(
   field_name=”vector”,
   index_params={
       “Index_type”: “AUTOINDEX”,
       “Metric_type”: “COSINE”,
       }

Insert data into Milvus

If you already have generated embeddings from your unstructured data, you can load them.

Input data can be in the form of a pandas dataframe or a list of dictionaries. It must contain a vector embedding. The remaining fields are optional: original text chunk and metadata fields.

from pymilvus import connections

## 1. Input data can be pandas dataframe or list of dicts.
data_rows = []
data_rows.extend([
   {“vector”: np.random.randn(768).tolist(),
    “text”: “This is a document”,
    “source”: “source_url_1”},
   {“vector”: np.random.randn(768).tolist(),
    “text”: “This is another document”,
    “source”: “source_url_2”},
])

## 2. Insert data into milvus.
mc.insert(data_rows)
mc.flush()

Query the collection

The questions need to be embedded using the same model used to embed the unstructured data loaded in the database. You can query the collection using the question embeddings with Milvus default search parameters. The same best practices apply to choosing the right search index for optimal performance.

Below, Milvus will return the top_k = 3 most similar results. Also, notice that the original text chunk and metadata are returned, which can help with grounding (basing generated text on factual information to reduce hallucinations).

## 1. Search for answers to your embedded questions.
mc.load()
results = mc.search(
   data=encoder([“my_question_1”]),
   anns_field=”vector”,
   output_fields-[“text”, “source”], #optional return fields
   limit=3,
   param={}, #no params if using milvus defaults
)

## 2. View the answers.
for n, hits in enumerate(results):
   print(f”{n}th result:”)
   for hit in hits:
      print(hit)

In my following blog, I’ll cover how to build a chatbot using LangChain and Milvus. Stay tuned.

More resources to get started with Milvus and Zilliz

Christy Bergman
Christy Bergman is a passionate Developer Advocate at Zilliz. She previously worked in distributed computing at Anyscale and as a Specialist AI/ML Solutions Architect at AWS. Christy studied applied math, is a self-taught coder, and has published papers, including one with ACM Recsys. She enjoys hiking and bird watching.

Getting Started with a Milvus Connection

Install and start a Milvus server

Get the Milvus client (connection)

Create a collection

Insert data into Milvus

Query the collection

More resources to get started with Milvus and Zilliz

Content

Start Free, Scale Easily

Share this article