Twelve Labs and Zilliz Cloud Integration
Twelve Labs and Zilliz Cloud integrate to power advanced semantic video search and analysis, combining Twelve Labs' multimodal video AI models with Zilliz Cloud's scalable vector database for efficient storage, indexing, and retrieval of video content across speech, text, audio, and visual modalities.
Use this integration for FreeWhat is Twelve Labs
Twelve Labs develops video foundation models that make petabytes of video content searchable using natural language. Their technology performs precise, context-aware searches across speech, text, audio, and visuals for locating specific moments in large video libraries. The company aims to build infrastructure for multimodal video understanding by mapping natural language to video elements including actions, objects, and background sounds, enabling applications for semantic video search, scene classification, topic extraction, and automatic summarization.
By integrating with Zilliz Cloud (fully managed Milvus), Twelve Labs' multimodal video embeddings can be stored, indexed, and searched at scale through a fully managed vector database, enabling developers to build powerful video search applications that process, index, and retrieve video content with high precision and speed for use cases like content moderation, media analytics, highlight generation, and ad insertion.
Benefits of the Twelve Labs + Zilliz Cloud Integration
- Multimodal video understanding: Twelve Labs' Embed API generates embeddings that capture speech, text, audio, and visual elements simultaneously, while Zilliz Cloud stores and indexes these rich multimodal vectors for comprehensive video search.
- Temporal video analysis: Video embeddings include time-range metadata (start and end offsets), enabling users to find specific moments within videos — not just entire clips — with Zilliz Cloud's efficient retrieval.
- Scalable video search infrastructure: Zilliz Cloud's high-performance vector database handles large-scale video embedding collections, enabling fast similarity search even across petabytes of video content.
- Natural language video queries: Users can search video content using natural language text queries, with Twelve Labs generating query embeddings and Zilliz Cloud performing similarity search to find the most relevant video segments.
How the Integration Works
Twelve Labs serves as the video AI layer, generating multimodal embeddings from video content using its Embed API and the Marengo-retrieval-2.6 engine. It processes video URLs, extracts features across speech, text, audio, and visual modalities, and produces 1024-dimensional embedding vectors with temporal metadata for each video segment.
Zilliz Cloud serves as the vector database layer, storing and indexing the video embeddings generated by Twelve Labs. It provides high-performance similarity search with low latency, enabling fast retrieval of the most relevant video segments based on query vectors.
Together, Twelve Labs and Zilliz Cloud create a complete semantic video search solution: videos are processed by Twelve Labs' Embed API to generate multimodal embeddings, which are stored in Zilliz Cloud with metadata including time ranges and video URLs. When a user submits a search query, Twelve Labs generates a query embedding, and Zilliz Cloud performs similarity search to find the most relevant video moments — enabling content discovery, recommendation systems, and advanced video analytics.
Step-by-Step Guide
1. Set Up the Development Environment
Create a new project directory and install required packages:
mkdir video-search-tutorial cd video-search-tutorial python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` pip install twelvelabs pymilvusSet up your Twelve Labs API key as an environment variable:
export TWELVE_LABS_API_KEY='your_api_key_here'2. Connect to Milvus and Create a Collection
Initialize the Milvus client and create a collection for video embeddings:
from pymilvus import MilvusClient milvus_client = MilvusClient("milvus_twelvelabs_demo.db") collection_name = "twelvelabs_demo_collection" if milvus_client.has_collection(collection_name=collection_name): milvus_client.drop_collection(collection_name=collection_name) milvus_client.create_collection( collection_name=collection_name, dimension=1024 # The dimension of the Twelve Labs embeddings )3. Generate Embeddings with Twelve Labs Embed API
Use the Twelve Labs Python SDK to generate embeddings for videos:
from twelvelabs import TwelveLabs from twelvelabs.models.embed import EmbeddingsTask import os TWELVE_LABS_API_KEY = os.getenv('TWELVE_LABS_API_KEY') twelvelabs_client = TwelveLabs(api_key=TWELVE_LABS_API_KEY) def generate_embedding(video_url): task = twelvelabs_client.embed.task.create( engine_name="Marengo-retrieval-2.6", video_url=video_url ) print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}") def on_task_update(task: EmbeddingsTask): print(f" Status={task.status}") status = task.wait_for_done( sleep_interval=2, callback=on_task_update ) print(f"Embedding done: {status}") task_result = twelvelabs_client.embed.task.retrieve(task.id) embeddings = [] for v in task_result.video_embeddings: embeddings.append({ 'embedding': v.embedding.float, 'start_offset_sec': v.start_offset_sec, 'end_offset_sec': v.end_offset_sec, 'embedding_scope': v.embedding_scope }) return embeddings, task_result4. Insert Embeddings into Milvus
Store the video embeddings along with metadata in the Milvus collection:
def insert_embeddings(milvus_client, collection_name, task_result, video_url): data = [] for i, v in enumerate(task_result.video_embeddings): data.append({ "id": i, "vector": v.embedding.float, "embedding_scope": v.embedding_scope, "start_offset_sec": v.start_offset_sec, "end_offset_sec": v.end_offset_sec, "video_url": video_url }) insert_result = milvus_client.insert(collection_name=collection_name, data=data) print(f"Inserted {len(data)} embeddings into Milvus") return insert_result video_url = "https://example.com/your-video.mp4" embeddings, task_result = generate_embedding(video_url) insert_result = insert_embeddings(milvus_client, collection_name, task_result, video_url)5. Perform Similarity Search
Search for similar video segments using a query vector:
def perform_similarity_search(milvus_client, collection_name, query_vector, limit=5): search_results = milvus_client.search( collection_name=collection_name, data=[query_vector], limit=limit, output_fields=["embedding_scope", "start_offset_sec", "end_offset_sec", "video_url"] ) return search_results query_vector = task_result.video_embeddings[0].embedding.float search_results = perform_similarity_search(milvus_client, collection_name, query_vector) print("Search Results:") for i, result in enumerate(search_results[0]): print(f"Result {i+1}:") print(f" Video URL: {result['entity']['video_url']}") print(f" Time Range: {result['entity']['start_offset_sec']} - {result['entity']['end_offset_sec']} seconds") print(f" Similarity Score: {result['distance']}")Learn More
- Advanced Video Search: Leveraging Twelve Labs and Milvus for Semantic Retrieval — Official Milvus tutorial for video search with Twelve Labs
- Twelve Labs and Milvus for Semantic Retrieval — Zilliz blog on advanced video search
- Twelve Labs Embed API Documentation — Official Twelve Labs embedding documentation
- Twelve Labs Multimodal Embeddings — Twelve Labs blog on multimodal embeddings
- Milvus Introduction — Introduction to Milvus vector database