Airbyte
Real-time data ingestion for your RAG applications with Airbyte and Zilliz Cloud or Milvus vector database
Use this integration for FreeWhat is Airbyte?
Airbyte is an open-source data movement infrastructure for building extract and load (EL) data pipelines. While other data pipeline platforms may boast a plethora of integrations with renowned sources such as Stripe and Salesforce, they often need to pay more attention to the integration needs of more minor services.
Airbyte fills this crucial gap by developing and maintaining connectors and fostering a vibrant community of users who can leverage each other's custom connectors. It's common practice for companies to build their tailor-made connectors to support their unique applications. Airbyte's open-source model encourages collaboration and mutual support among organizations to maintain these connectors.
Benefits of the Airbyte and Milvus/Zilliz Integration
Both Milvus and Zilliz Cloud (the managed Milvus) have integrated with Airbyte, providing a Milvus destination connector that allows users to extract unstructured data from various connected sources, encode this data into vector embeddings using a pre-trained embedding model and then ingest them into Milvus or Zilliz Cloud for efficient storage and similarity search.
By seamlessly facilitating the transfer and processing of data, Airbyte unlocks a whole new realm of possibilities for real-time, AI-driven applications. Take, for example, the Milvus and Zilliz Cloud integration, which empowers the creation of real-time semantic search across data sources like customer support systems, enabling the system to instantly deliver relevant information to users. As a result, the reliance on support agents is significantly reduced, leading to a remarkable enhancement in the overall user experience. This integration can also be used to build Retrieval Augmented Generation (RAG) systems, product recommendation systems, generative chatbots, and other GenAI applications.
Key benefits of the Airbyte and Milvus/Zilliz integration:
Get connected with extensive data sources: Airbyte connects with hundreds of popular data sources, including databases, data warehouses, and SaaS products. The Milvus destination connector lets you tap into this extensive array of data and ensures a seamless data flow to enhance your data-driven projects or GenAI applications.
Efficient Data Transfer: Airbyte seamlessly transfers data from various sources into Milvus/Zilliz, enabling on-the-fly vector embedding calculation and streamlining data processing.
Streamlined AI workflow: This integration helps you load your unstructured data directly into the Milvus/Zilliz vector database by handling data ingestion, chunking, formatting, vectorization, indexing, storage, and similarity search.
Enhanced Search Functionality: This integration boosts semantic search capabilities within vector databases. Utilizing vector embeddings, the system can automatically identify and present closely related content based on semantic similarity, which is invaluable for applications needing efficient retrieval from unstructured data.
Simple Set-Up Process: Setting up a Milvus cluster and configuring Airbyte for data synchronization are straightforward, as is building applications using Streamlit and the OpenAI embedding API if desired.
How the Airbyte and Zilliz/Milvus Integration Works
The Milvus destination connector handles the following tasks:
- Processing - split up individual records in chunks so they will fit the context window and decide which fields to use as context and which are supplementary metadata.
- Embedding - convert the chunks into vector embeddings using a pre-trained embedding model. Currently, our integration supports OpenAI's text-embedding-ada-002 and Cohere's embed-english-light-v2.0.)
- Indexing - store the vectors in Milvus or Zilliz Cloud for similarity search.
The diagram below shows how this Airbyte and Zilliz Cloud work together:
How Airbyte and Zilliz Cloud work together
How to Use Airbyte with Zilliz/Milvus
Documentation | Milvus Destination Connector
Blog | How to Use Milvus and Airbyte for Similarity Search on All Your Data