Blog
Milvus 2.6.x Now Generally Available on Zilliz Cloud, Making Vector Search Faster, Smarter, and More Cost-Efficient for Production AI

Milvus 2.6.x Now Generally Available on Zilliz Cloud, Making Vector Search Faster, Smarter, and More Cost-Efficient for Production AI

Jan 12, 202611 min read

Milvus 2.6.x has been available in the open-source community for some time and has quickly become a popular upgrade path. Users have adopted it for smoother data ingestion, better performance, improved cost efficiency, and more capable hybrid search. As soon as Milvus 2.6.x was released upstream, we started hearing the same question from many Zilliz Cloud users: “When can we run Milvus 2.6.x in production on Zilliz Cloud?”

Today, we’re excited to share that Milvus 2.6.x is officially Generally Available (GA) on Zilliz Cloud, and all customers can begin using it immediately. This release not only brings the full Milvus 2.6.x feature set to our fully managed cloud environment but also introduces a series of cloud-only optimizations that elevate performance, stability, and efficiency beyond what’s possible in self-managed deployments.

For teams rapidly building, prototyping, or operating production-grade RAG systems, conversational agents, recommendation engines, enterprise knowledge platforms, or any other AI-driven application, running Milvus 2.6.x on Zilliz Cloud means accessing the latest capabilities of Milvus with the ease, reliability, and cost efficiency of a fully managed service—so teams can stay focused on building, not operating infrastructure.

Why Milvus 2.6.x Matters — and What’s Special on Zilliz Cloud

Milvus 2.6.x is a major step forward for production vector search, bringing faster indexing, lower storage and memory costs, better text retrieval, and improved language support. It upgrades how data is stored, searched, and managed across the entire system, making large-scale AI workloads more efficient and easier to operate.

With Milvus 2.6.x now GA on Zilliz Cloud, all of these improvements are available immediately in a fully managed, production-ready environment. And because Zilliz Cloud is built and operated by the creators of Milvus, it goes further than the open-source release—adding many cloud-only optimizations for automation, performance, reliability, and enterprise-grade stability.

In short: Milvus 2.6.x delivers the biggest technical leap forward in years, and Zilliz Cloud turns it into a turnkey, scalable service you can run in production today.

Let’s take a look at the key features and benefits you can expect when running Milvus 2.6.x on Zilliz Cloud.

Three-Layer Tiered Storage That Cuts Vector Search Costs to Near–S3 Levels

Not all data is equal. In real AI systems, only a small portion of data—recent clicks, trending products, or frequently referenced documents—are accessed constantly. The rest sits idle most of the time. But many vector databases force you to store everything in high-performance (and high-cost) storage, dramatically inflating your total cost of ownership.

Milvus 2.6.x changes that with built-in tiered storage, and Zilliz Cloud takes it further with a fully cloud-native, three-layer design. Instead of paying SSD or memory pricing for your entire corpus, your data is automatically placed in the storage tier that matches its access pattern.

Specifically, in the Zilliz Cloud tiered storage clusters:

Object storage (e.g., S3) stores your full dataset
Local SSD acts as a warm cache that accelerates repeated access
Memory serves as the hot tier for instant-response queries

This architecture adapts in real time: when traffic patterns shift, the system automatically promotes or demotes data across tiers. In production tests, this delivers 90%+ cache hit rates, meaning most queries are served from fast memory or SSD—but your baseline storage bill still behaves like S3.

In practice, tiered storage can cut storage costs by up to 87%, reduce compute expenses by 25%, and bring your overall TCO remarkably close to raw S3 pricing. A 10 TB dataset that once cost around $3,000/month can now run for roughly $400—while still delivering consistently low-latency search for hot workloads.

Therefore, tiered storage clusters are ideal for ultra-large, cost-sensitive applications with clear hot/cold data patterns, such as:

Long-tail product search where millions of SKUs exist but only the top 5–10% are queried daily
Company-wide document repositories where only the latest content is frequently searched
News and media archives where current articles are hot and historical content is cold
Any workload where data volume grows faster than budget, and only a subset of data needs fast, in-memory performance

For more information about tiered storage pricing, check out Zilliz Cloud docs.

Index Build Level: Automatically Trade off Search Accuracy and Storage Costs

Tiered storage already reduces the cost of storing raw vectors by placing different data in different tiers. But even when raw data becomes cheap to keep, there’s still one major cost driver left in vector search systems: indexes.

Just like a card catalog lets you find books instantly in a library, indexes make massive-scale vector searches fast—and they must stay in memory or SSD. Basically, the more accurate you want your results, the more neighbors, graph links, and metadata an index must maintain, and the larger its size. In billion-scale workloads, the index can even exceed the size of the raw vectors themselves.

However, not every AI application needs maximum search accuracy. Some workloads—like fraud detection or ranking—absolutely require it. But others, such as long-tail product search, log retrieval, archives, and experimentation environments, can trade a tiny amount of accuracy for significantly smaller indexes and lower cost. This is why controlling the accuracy–capacity trade-off matters: a slightly less precise index can be dramatically smaller, cheaper, and still perfectly acceptable for many workloads.

During this release, Zilliz Cloud introduces the Index Build Level feature, which helps you automatically select the accuracy-capacity balance that fits your AI workloads.

Precision-first: Maximum recall (accuracy) for mission-critical workloads like AI-powered fraud detection.
Balanced (Default): A strong mix of accuracy, performance, and memory efficiency—ideal for most general-purpose AI applications.
Capacity-first: Heavily optimized for storage density. Perfect for cold data, large archives, or workloads where perfect recall isn’t needed.

Behind the scenes, Zilliz Cloud applies a next-generation quantization engine that dynamically tunes index compression and structure, giving you the ideal balance of size, accuracy, and cost—without requiring any manual configuration.

Expanded Data Type Support for More Real-World AI Use Cases

Previously, supporting location-aware search, time-based filtering, or complex structured entities in AI applications required extra databases, GIS tools, or heavy preprocessing alongside a vector database. Milvus 2.6.x removes that complexity by adding more foundational data types that allow these workloads to run entirely within Zilliz Cloud. This broadens what developers can build natively, streamlines architecture, and reduces operational overhead.

Geometry Fields (POINT, LINESTRING, POLYGON)

In many delivery, logistics, and e-commerce applications, search must combine semantic relevance with geospatial filtering. Queries such as “Find restaurants similar to this one within 1 km” or “Retrieve EV charging stations nearby, ranked by user preferences” previously required a separate GIS engine alongside a vector database. With native geometry types, Zilliz Cloud lets you blend semantic similarity + location awareness in a single query—no extra systems to deploy or maintain.

TimestampTz (Timezone-Aware Timestamps)

Time is a critical dimension in many real-world AI applications—from event planning and monitoring systems to log analytics and alerting. Without native time support, teams had to combine vector databases with external time-series stores or awkward encoding strategies. With TimestampTz, Zilliz Cloud now supports time-windowed vector search, recency-boosted ranking, and event-driven retrieval, making temporal reasoning far simpler and enabling cleaner pipelines for logs, alerts, monitoring, and real-time analytics.

INT8 Vector Type (8-bit Embeddings)

Modern embedding models—especially efficiency-focused ones like E5-base and MiniLM-L12—often produce 8-bit embeddings. Previously, developers had to up-convert them to floats, wasting memory and storage. Native INT8 support means:

Smaller vectors → lower storage and memory cost
Smaller payloads → higher throughput
No preprocessing or format conversion

This is especially valuable for edge AI, lightweight models, and cost-sensitive workloads.

Struct and Array-of-Structs

Real-world data isn’t just a list of simple fields. A product might have multiple sizes and images, a user might have several preferences and behaviors, and a document might contain sections, tags, and metadata. In earlier releases of Zilliz Cloud, modeling this kind of “data inside data” required flattening fields and duplicating information.

With Struct and Array-of-Structs now supported natively, Zilliz Cloud lets you store and query rich, nested information directly. This makes it easier to represent complex entities as they actually are—without workarounds or extra systems. This unlocks cleaner modeling for use cases such as:

Product catalogs with nested attributes
User profiles with multiple behavioral signals
Documents with layered or hierarchical metadata
Multi-modal items combining text, images, and structured metadata

By keeping these relationships intact, developers get simpler queries, fewer joins, and a more realistic, expressive data model—all inside the vector database.

JSON Shredding and JSON Path: Make Metadata Filtering 100× Faster with Better Search Accuracy

AI applications such as ecommerce search and recommendation systems don’t just store and retrieve vectors—they rely heavily on metadata: product details, user attributes, document tags, event logs, preferences, configurations, and more. JSON is usually the natural format because it’s flexible and easy to work with. However, in most databases (including vector databases), filtering JSON requires scanning the entire JSON blob. As your data grows, these filters quickly become painfully slow.

Milvus 2.6.x on Zilliz Cloud removes that bottleneck with native JSON Shredding and JSON Path, delivering up to 100× faster metadata filtering directly inside the vector database.

JSON Shredding: You still store your metadata as regular JSON. Nothing changes in how you write data. But Zilliz Cloud automatically restructures JSON under the hood so that filtering becomes up to 100× faster, even for large or messy documents. It gives you the speed of a structured database without sacrificing JSON’s flexibility.
JSON Path: JSON Path lets you index specific keys inside your JSON (like price, category, or event.type) so filters on those fields become instant. You simply tell Zilliz Cloud which keys matter, and it handles the optimization. This is perfect for predictable filters like ranges, equality, or category-based lookups.

Together, JSON Shredding and JSON Path bring structured filtering and vector search into one system, making metadata filtering dramatically faster and improving the accuracy of your final retrieval results. This feature is especially valuable for:

Recommendation systems with rich user or item attributes
RAG pipelines that filter or route documents by metadata
Multi-tenant architectures that need efficient segmentation
And many other AI applications with rich metadata

BM25-Optimized Full-Text Search: up to 7× Faster Than Elasticsearch

Enterprise knowledge assistants, customer support chatbots, and many other RAG-based AI applications rely on more than vector similarity. They also need strong full-text search to match exact terms, handle rare entities, filter by domain-specific language, and anchor LLM responses to precise facts. To achieve this, many teams still run Elasticsearch or another text engine alongside a vector database—doubling their infrastructure and slowing down retrieval.

Zilliz Cloud introduced hybrid keyword + vector search with Milvus 2.5.x GA earlier last year. With Milvus 2.6.x GA, we’re taking it much further with a BM25-optimized full-text engine that’s tightly integrated with vector search:

4× faster than Elasticsearch—and up to 7× faster on specific datasets
Indexes only one-third the size of the raw text
Unified keyword + vector retrieval in one system
Lower latency and more accurate grounding for LLM-generated answers

This upgrade is especially valuable for real-world RAG applications such as:

Enterprise knowledge assistants that need semantic retrieval combined with exact matches for names, acronyms, regulations, and error codes
Customer support and helpdesk copilots that must filter documents by product names, version numbers, configuration parameters, and diagnostic messages
And more.

With BM25 built directly into Milvus on Zilliz Cloud, hybrid search becomes dramatically faster, cheaper, and easier to operate—without maintaining a separate text search stack.

Improved Multilingual Support for Global Users, Plus Better Text Search for All Languages

If your AI application serves users across multiple countries and languages, search quality often depends on how well the system tokenizes and understands text. Milvus 2.6.x introduces several upgrades that significantly improve retrieval accuracy, especially for languages with complex segmentation such as Japanese, Korean, and Chinese. With this release, Zilliz Cloud brings these capabilities directly to production with optimized defaults and seamless integration.

Enhanced multilingual support in Zilliz Cloud includes:

Lindera + ICU tokenizers for dramatically better Japanese, Korean, and mixed-language segmentation
Jieba with custom dictionary support, allowing teams to fine-tune Chinese tokenization for specific domains or product vocabularies
run_analyzer, now available in the cloud environment to help teams inspect and debug tokenization behavior for consistent, predictable search quality

Zilliz Cloud also upgrades general text retrieval for all language workloads:

Phrase Match, enabling precise, ordered phrase queries with configurable slop
NGRAM Index accelerating substring, wildcard, fuzzy, and partial-text search across VARCHAR fields and JSON paths

Together, these improvements make multilingual and complex-text search significantly more accurate and flexible in Zilliz Cloud—ideal for global apps, e-commerce platforms, customer-support copilots, enterprise knowledge assistants, and any system where users expect reliable retrieval regardless of language or writing style.

For a full list of Milvus 2.6.x features on Zilliz Cloud, check out the Milvus release notes.

Ready to Try Milvus 2.6.x on Zilliz Cloud?

All the new Milvus 2.6.x capabilities are now fully available on Zilliz Cloud.

If you’ve already had a Zilliz Cloud account, simply sign in and start using the new features right away—no upgrades or migrations required.
New to Zilliz Cloud? Sign up for free and get $100 in credits to experience the world’s leading managed vector database.
Have questions about any of the updates? Check out the latest documentation or reach out to Zilliz Support—we’re here to help you get the most out of Milvus 2.6.x on Zilliz Cloud.

Build Without Limits: A Closer Look at Zilliz Cloud’s Enterprise-Ready Capabilities

With the GA of Milvus 2.6.x on Zilliz Cloud, the platform further solidifies its position as the most performant, cost-efficient, and secure fully managed vector database service—while delivering the full power of Milvus’s advanced AI search capabilities in an operationally effortless, enterprise-ready environment.

Elastic scaling & cost efficiency – One-click deployment, serverless autoscaling, and pay-as-you-go pricing.
Advanced AI search – Vector, full-text, and hybrid (sparse + dense) search with metadata filtering, dynamic schema, and multi-tenancy.
Enterprise-grade reliability & security – 99.95% SLA, SOC 2 Type II and ISO 27001 certifications, GDPR compliance, HIPAA readiness, RBAC, BYOC, audit logs, business critical plan, and now global clusters. See our trust center for more information.
Global availability – Deployments across AWS, GCP, and Azure with sub-100ms latency worldwide.
Seamless migration – Built-in tools to move from Pinecone, Qdrant, Elasticsearch, PostgreSQL, OpenSearch, AWS S3 vectors, Weaviate, or on-prem Milvus.
Natural language querying – MCP server support for intuitive queries without complex APIs.

Taken together, these capabilities make Zilliz Cloud more than a vector database — a fully managed, production-ready platform for building and scaling AI applications without limitations.

Updated on Mar 17, 2026

Fendy Feng
Fendy Feng is the Product Marketing Manager at Zilliz. She has extensive experience developing and enhancing the impact of open-source projects in various global markets by producing high-quality, tailored content. Before joining Zilliz, Fendy worked as a Content Strategist at PingCAP, a fast-growing E-Series startup renowned for its open-source distributed SQL database.

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

My Wife Wanted Dior. I Spent $600 on Claude Code to Vibe-Code a 2M-Line Database Instead.

Write tests, not code reviews. How a test-first workflow with 6 parallel Claude Code sessions turns a 2M-line C++ codebase into a daily shipping pipeline.

Vector Databases vs. Key-Value Databases

Use a vector database for AI-powered similarity search; use a key-value database for high-throughput, low-latency simple data lookups.

Top 5 AI Search Engines to Know in 2025

Discover the top AI-powered search engines of 2025, including OpenAI, Google AI, Bing, Perplexity, and Arc Search. Compare features, strengths, and limitations.