How to Integrate OpenAI Embedding API with Zilliz Cloud

In 2018, Zilliz created the Milvus vector database to change the way we approach search and storage (we've explored the power of embeddings and vector databases in a previous blog post). The early days of Milvus were focused on the features a vector database should provide. As such, we directed the bulk of our efforts towards perfecting the underlying user experience, reliability, performance, and scalability. With this development model, we saw consistent upward growth in the Milvus community in terms of users, contributors, and stars (nearing 15000).
In recent months, especially since the release of Milvus 2.2, the community has brought up the need to expand the vector database ecosystem, i.e. visualizations, tools, connectors, etc. One of the most requested features is tighter integration with embedding models.
Embedding model integrations
To address this request in particular, we'll be providing embedding model integrations - a way to connect your Milvus and/or Zilliz Cloud database to open source or paid embedding models. There are many different types of embedding models to choose from, depending on the type of data you're working with. To tackle the increasing variety of embedding methods and application requirements that our users face today, we'll be providing two parallel sets of integrations.
The first set of integrations is a series of POC-ready examples and runnable scripts utilizing Milvus and Zilliz Cloud. These examples are meant to provide a fully-customizable starting point for software engineers to create applications across a variety of user cases. Most of these examples will be fairly straightforward scripts combining upstream embedding models and the Milvus SDK. You can find these in our documentation, where each example might look something like this (significantly simplified for readability):
from pymilvus import connections, Collection
import openai
...
connections.connect(uri=URI, user=USER, password=PASSWORD, secure=True)
collection = Collection(name=COLLECTION_NAME, schema=schema)
collection.create_index(field_name="embedding", index_params=index_params)
...
for text in document:
embedding = openai.Embedding.create(
input=text,
engine=OPENAI_ENGINE)["data"][0]["embedding"]
collection.insert([embedding])
...
While small example scripts are good for general-purpose use, we found that there was significant reuse across each script; model inference and database querying, for example, are two actions executed across nearly all examples. To solve this recurring problem, we launched Towhee, a Zilliz project under the Milvus ecosystem. Towhee integrates hundreds of open-source models, embedding APIs, and in-house models, giving ML practitioners the ability to assemble end-to-end search pipelines backed by Milvus or Zilliz Cloud in just a few lines of code. A sample pipeline to vectorize book titles (using OpenAI's embedding API) and insert them into Milvus could look something like this:
pipeline = (
pipe.input('id', 'text')
.map(
ops.text_embedding.openai(
engine='embedding-engine',
api_key='my-api-key'
)
)
.map(
ops.ann_insert.milvus_client(
host='my-vector-database.url',
port='19530',
collection_name='my-collection'
)
)
.output()
)
You can see more Towhee examples in the Milvus bootcamp, along with a complete guide in the Towhee documentation.
Connect with us
Long story short; we've made a lot of progress in five years, but we still have a long way to go. Zilliz will continue to be a key backer and the primary driving force behind the Milvus project, but we'll also be focused on integrations and partnerships with the broader machine learning ecosystem moving forward.
If you're an open source committer and would like to chat about potential integration, please reach out to us or shoot us a message on Twitter. We look forward to having you as a part of the community!