Milvus 2.6: Faster Search, Lower Cost, Smarter Scaling

You’re in!

Webinar

Milvus 2.6 Deep Dive: Faster Search, Lower Cost, Smarter Scaling

See you at the webinar.

Transcript

About — Overview of Milvus and Zilliz Welcome everybody to our webinar today. As we reach the end of the year, it has been a very monumental moment for everyone. I believe all of us have made a lot of achievements throughout this year, and the same is true for us. James has led our engineering team and delivered many important upgrades to our product. Today, he is very excited to walk us through what’s new in Milvus 2.6. Before we begin, I’d like to share a few notes. This session is being recorded, and the recording will be distributed via email after the session so you can review the content later. We will also have a Q&A section at the end. If you have questions, feel free to raise your hand or type them in the chat, and we will address them as we go. Now let me introduce James. James is our VP of Engineering at Zilliz. He has deep expertise in database architecture and has been instrumental in the evolution of Milvus. Today, he will give us a detailed breakdown of the Milvus 2.6 upgrades. James, please go ahead. Thank you, Jenny, and thanks everyone for joining. Milvus 2.6 was released more than four months ago. Over the past several months, we’ve spent a significant amount of time stabilizing the release and adding many new features. At this point, Milvus 2.6 is production-ready, and we are already seeing many customers using it in production on the cloud. I hope today’s session gives you enough information to understand what’s new in 2.6 and helps you identify features that are useful for your own use cases. Before moving on, there is a question about whether Milvus will support a graph database extension for knowledge graphs. The short answer is no. Instead of becoming a traditional graph database, we are working on graph embedding strategies. We generate graphs through batch processing, convert them into graph embeddings, and use vector search to perform relationship searches. This approach focuses on similarity and semantic relationships rather than exact graph matching. I’ll talk more about this later. Now let’s move on to today’s agenda. First, we’ll do a quick introduction to Milvus. Then we’ll cover four major improvements in Milvus 2.6. These include improvements to the data model, enhanced search functionality, performance and cost optimizations, and architectural changes that make Milvus easier to maintain. Milvus is an open-source vector database we’ve been building for almost six years. It helps users store, index, and manage vector data. The community has grown rapidly, with more than 400 contributors and over 40,000 GitHub stars. Today, more than 10,000 enterprise users are running Milvus in production. Zilliz is the company behind Milvus. In addition to the open-source project, we offer a fully managed cloud version called Zilliz Cloud, which is a leader among vector database providers. Our goal is to help organizations make sense of unstructured data, not only through vector search but by building a complete stack for AI-native data processing.

Data Model Improvements — STRUCT, GEO, TIMESTAMPTZ, TTL, and Decay Reranker The first major topic is data model improvements. We introduced a new STRUCT data type in Milvus 2.6. Originally, Milvus followed a wide-column database model where each entity had multiple columns. This works well for structured schemas, but many AI workloads deal with nested or document-style data. For example, when processing PDFs, a document may be split into multiple chunks, each with its own embeddings and metadata. Video data may contain frames, and each frame can have multiple embeddings. These patterns are common across RAG and multimodal use cases. That’s why we introduced STRUCT to support nested data structures more naturally. There are two common patterns this enables. The first is multi-vector models such as ColBERT or similar approaches, where token-level embeddings are generated and late interaction is used during search. These models offer better recall and native multimodality but come with higher storage and performance costs. With STRUCT and optimizations in Milvus 2.6, we can significantly improve both recall and performance for these workloads. The second pattern is document chunking with metadata. Traditionally, users relied on GROUP BY operations, which are slower. With STRUCT, you can store the entire document in a single row, enabling native grouping and better transaction guarantees. This also reduces data duplication, especially when document-level metadata is shared across multiple chunks. We also introduced geolocation data support. Many use cases combine vector search with location filtering, such as finding nearby restaurants or services. Milvus 2.6 supports geolocation data types and spatial indexing, allowing users to perform efficient location-aware vector searches without maintaining separate systems. Time-based data handling is another major improvement. We introduced TIMESTAMPTZ to make timestamp data easier to manage and integrate with existing systems. This works seamlessly with Milvus’s time-to-live feature, allowing data to expire automatically. We are also introducing entity-level TTL, enabling different documents to have different lifetimes within the same collection. In addition, we added a decay re-ranker, which gradually reduces the relevance of older data. This is particularly useful for agent memory, news recommendation, and e-commerce scenarios where freshness matters. Finally, Milvus 2.6 supports schema changes such as adding new columns to existing collections. Users can specify default values or backfill data using Spark. This eliminates the need to rebuild collections from scratch and significantly reduces operational overhead. We also introduced data deduplication using locality-sensitive hashing. This supports both batch deduplication with Spark and streaming deduplication within Milvus, allowing users to manage large-scale training datasets efficiently.

Search Functionality Enhancements — Embeddings, Reranking, Boosting, Phrase Match, and Highlight Milvus 2.6 introduces native integration with embedding and reranking models. Users can directly use third-party services such as OpenAI or Cohere, or deploy self-hosted inference engines like Triton or Hugging Face models. This simplifies deployment and reduces latency by colocating inference with vector search. We also support reranking models as operators, enabling both SaaS-based and self-hosted rerankers. In addition to model-based reranking, Milvus introduces boosting, which allows users to adjust relevance scores based on business logic such as freshness, popularity, or user attributes. Phrase match extends traditional text matching by allowing multiple keywords to be matched together, improving precision in search queries. This feature can also be combined with boosting to prioritize more relevant results. Highlighting is another important addition. Milvus supports both keyword-based highlighting and semantic highlighting. Semantic highlighting identifies relevant content even when it does not exactly match the query terms. This is especially useful for search result explanations and for trimming context before sending data to large language models, helping reduce token usage.

Performance & Cost Optimizations — Quantization, Tiered Storage, JSON Acceleration, and NGRAM Index One of the most exciting improvements in Milvus 2.6 is the introduction of RabitQ quantization. This technique significantly reduces memory usage while maintaining high recall. Compared to traditional binary quantization, RabitQ improves accuracy by more than 10% and delivers substantial performance gains. Tiered storage is another major enhancement. Cold data remains in object storage such as S3, while hot data is cached locally. This reduces memory and disk usage and enables efficient multi-tenant workloads without sacrificing performance. Milvus 2.6 also accelerates JSON filtering through JSON shredding and inverted indexing. This dramatically improves performance for dynamic fields and complex metadata queries. Users can further enhance performance by building indexes on JSON fields when needed. We also introduced the NGRAM index to accelerate substring and pattern matching. This improves performance by up to 100x for LIKE queries, although it increases memory usage. Users can choose the appropriate trade-off based on their workload.

Architecture Modernization — Streaming Node, Woodpecker WAL, CDC, and Data Lake Integration Milvus 2.6 introduces a new streaming node, separating real-time ingestion from historical data serving. This improves consistency guarantees, simplifies load balancing, and reduces memory pressure. Woodpecker replaces external dependencies such as Kafka or Pulsar, reducing maintenance complexity. It supports both memory-buffer and commit-buffer modes, balancing latency and cost. Further optimizations are planned in upcoming releases. Change Data Capture is now integrated directly into Milvus, enabling replication, incremental backups, and synchronization with data warehouses. This lays the foundation for global deployments and large-scale data growth. Finally, Milvus integrates with data lakes such as Iceberg and Delta Lake. Users can build indexes directly on external data without duplicating storage. Spark and Ray can be used for feature engineering, deduplication, clustering, and ETL workflows, all sharing the same underlying storage and indexes.

Milvus 2.6 Deep Dive: Faster Search, Lower Cost, Smarter Scaling

AI Assistant