Blog
How to Integrate OpenAI Embedding API with Zilliz Cloud

How to Integrate OpenAI Embedding API with Zilliz Cloud

Jan 11, 20234 min read

In 2018, Zilliz developed the Milvus vector database to transform how we handle search and storage (we’ve previously discussed the impact of embeddings and vector databases). Initially, Milvus focused on delivering core features essential for a vector database, emphasizing improving the user experience, ensuring reliability, and enhancing performance and scalability. This approach led to substantial growth within the Milvus community, including users, contributors, and stars—now approaching 30,000.

Recently, particularly with the release of Milvus 2.4, the community has expressed a strong interest in expanding the vector database ecosystem to include more tools, visualizations, and connectors. A key request has been for tighter integration with embedding models. This feedback reflects the evolving needs of users and the growing importance of embedding models in the vector database landscape.

Embedding model integrations

To address this growing demand, we're excited to introduce embeddings model integrations, which will seamlessly connect your Milvus or Zilliz Cloud database with both open-source and commercial models. These integrations are designed to accommodate the diverse range of machine learning models available today, catering to different types of data and use cases. Whether you're working with text, images, or other data types, this feature ensures that you can easily leverage the power of embedding models to enhance your semantic similarity search capabilities.

In response to the evolving landscape of embeddings model and user needs, we will offer two parallel sets of integrations. The first set focuses on popular open-source embeddings models, providing flexibility and cost-effectiveness for users who prefer community-driven solutions. The second set includes integrations with premium, commercial embeddings model, offering advanced features and enhanced performance for users with more specialized requirements. This dual approach ensures that all users, regardless of their embedding needs or budget, have access to powerful tools for optimizing their Milvus or Zilliz Cloud databases.

Why Integrating with Zilliz Cloud is Key

Integrating the OpenAI Embedding API with Zilliz Cloud is important for developers looking to enhance their vector search capabilities like Natural Language Processing. By combining the powerful, pre-trained embeddings from OpenAI with Zilliz Cloud’s high-performance vector database, you can create more accurate and efficient search and retrieval systems. OpenAI’s embeddings capture complex semantic relationships in your data, while Zilliz Cloud provides the scalability and speed needed to handle large volumes of vector data. This integration allows developers to leverage advanced AI models for better relevance in search results, making it easier to build applications that understand and respond to user queries with greater precision.

Moreover, this integration simplifies the development process by offering a streamlined way to handle and search through massive datasets. With Zilliz Cloud managing the backend infrastructure and OpenAI’s Embedding API providing the sophisticated data representations, developers can focus more on building their applications and less on the complexities of data handling. This setup not only improves performance but also reduces development time, enabling

Examples in Zilliz Cloud

The first set of integrations is a series of POC-ready examples and runnable scripts utilizing Milvus and Zilliz Cloud. These examples are meant to provide a fully-customizable starting point for software engineers to create applications across a variety of user cases. Most of these examples will be fairly straightforward scripts combining upstream embedding models and the Milvus SDK. You can find these in our notebooks, where each example might look something like this (significantly simplified for readability):

from pymilvus import connections, Collection
import openai

...

connections.connect(uri=URI, user=USER, password=PASSWORD, secure=True)
collection = Collection(name=COLLECTION_NAME, schema=schema)
collection.create_index(field_name="embedding", index_params=index_params)

...

for text in document:
    embedding =  openai.Embedding.create(
            input=text, 
            engine=OPENAI_ENGINE)["data"][0]["embedding"]
    collection.insert([embedding])

...

While small example scripts are good for general-purpose use, we found that there was significant reuse across each script; model inference and database querying, for example, are two actions executed across nearly all examples. To solve this recurring problem, we launched Towhee, a Zilliz project under the Milvus ecosystem. Towhee integrates hundreds of open-source models, embedding APIs, and in-house models, giving ML practitioners the ability to assemble end-to-end search pipelines backed by Milvus or Zilliz Cloud in just a few lines of code. A sample pipeline to vectorize book titles (using OpenAI's embedding API) and insert them into Milvus could look something like this:

pipeline = (
    pipe.input('id', 'text')
    .map(
        ops.text_embedding.openai(
            engine='embedding-engine',
            api_key='my-api-key'
        )
    )
    .map(
        ops.ann_insert.milvus_client(
            host='my-vector-database.url', 
            port='19530', 
            collection_name='my-collection'
        )
    )                             
    .output()
)

You can see more Towhee examples in the Milvus bootcamp, along with a complete guide in the Towhee documentation.

Connect with us

Long story short; we've made a lot of progress in five years, but we still have a long way to go. Zilliz will continue to be a key backer and the primary driving force behind the Milvus project, but we'll also be focused on integrations and partnerships with the broader machine learning ecosystem moving forward.

If you're an open source committer and would like to chat about potential integration, please reach out to us or shoot us a message on Twitter. We look forward to having you as a part of the community!

Updated on Nov 20, 2024

Zilliz

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

Semantic Search vs. Lexical Search vs. Full-text Search

Lexical search offers exact term matching; full-text search allows for fuzzy matching; semantic search understands context and intent.

Matryoshka Representation Learning Explained: The Method Behind OpenAI’s Efficient Text Embeddings

Matryoshka Representation Learning (MRL) is a method for generating hierarchical, nested embeddings that capture information at multiple levels of abstraction.

Building a RAG Application with Milvus and Databricks DBRX

In this tutorial, we will explore how to build a robust RAG application by combining the capabilities of Milvus, a scalable vector database optimized for similarity search, and DBRX.