How to Integrate OpenAI Embedding API with Zilliz Cloud
By Frank Liu on Jan 11, 2023
In recent months, especially since the release of Milvus 2.2, the community has brought up the need to expand the vector database ecosystem, i.e. visualizations, tools, connectors, etc. One of the most requested features is tighter integration with embedding models.
Embedding model integrations
To address this request in particular, we’ll be providing embedding model integrations - a way to connect your Milvus and/or Zilliz Cloud database to open source or paid embedding models. There are many different types of embedding models to choose from, depending on the type of data you’re working with. To tackle the increasing variety of embedding methods and application requirements that our users face today, we’ll be providing two parallel sets of integrations.
The first set of integrations is a series of POC-ready examples and runnable scripts utilizing Milvus and Zilliz Cloud. These examples are meant to provide a fully-customizable starting point for software engineers to create applications across a variety of user cases. Most of these examples will be fairly straightforward scripts combining upstream embedding models and the Milvus SDK. You can find these in our documentation, where each example might look something like this (significantly simplified for readability):
from pymilvus import connections, Collection import openai ... connections.connect(uri=URI, user=USER, password=PASSWORD, secure=True) collection = Collection(name=COLLECTION_NAME, schema=schema) collection.create_index(field_name="embedding", index_params=index_params) ... for text in document: embedding = openai.Embedding.create( input=text, engine=OPENAI_ENGINE)["data"]["embedding"] collection.insert([embedding]) ...
While small example scripts are good for general-purpose use, we found that there was significant reuse across each script; model inference and database querying, for example, are two actions executed across nearly all examples. To solve this recurring problem, we launched Towhee, a Zilliz project under the Milvus ecosystem. Towhee integrates hundreds of open-source models, embedding APIs, and in-house models, giving ML practitioners the ability to assemble end-to-end search pipelines backed by Milvus or Zilliz Cloud in just a few lines of code. A sample pipeline to vectorize book titles (using OpenAI’s embedding API) and insert them into Milvus could look something like this:
pipeline = ( pipe.input('id', 'text') .map( ops.text_embedding.openai( engine='embedding-engine', api_key='my-api-key' ) ) .map( ops.ann_insert.milvus_client( host='my-vector-database.url', port='19530', collection_name='my-collection' ) ) .output() )
Connect with us
Long story short; we’ve made a lot of progress in five years, but we still have a long way to go. Zilliz will continue to be a key backer and the primary driving force behind the Milvus project, but we’ll also be focused on integrations and partnerships with the broader machine learning ecosystem moving forward.
If you’re an open source committer and would like to chat about potential integration, please reach out to us or shoot us a message on Twitter. We look forward to having you as a part of the community!