How Zilliz Saw the Future of Vector Databases—and Built for Production

This post is a recap of the podcast with Innovator Coffee and James Luan, VP of Engineering at Zilliz.
Before generative AI went mainstream, vector databases were rarely discussed on their own. Most attention was still on relational databases, search engines, or big data frameworks. Vector search, if it came up at all, usually lived in research papers or inside algorithm libraries—not in conversations about production systems.
But vector databases didn’t come out of nowhere. They grew out of a deeper change in how data is created and used. Around 2017–2018, teams at Zilliz began to see the same problem showing up: companies wanted to work with far more unstructured data—text, images, audio, logs, user behavior—but their existing tools weren’t built for it. Traditional databases and keyword search could store this data, but they weren’t good at understanding it. They handled exact matches well. Meaning and similarity were another story.
Vectors offered a practical way to close that gap. By turning text, images, and other content into embeddings, similarity became something systems could calculate directly. Once data was represented this way, databases were no longer just storing records. They could retrieve information based on meaning, not just keywords.
Therefore, vector databases sit between powerful models and messy real-world data, making unstructured information searchable, comparable, and usable at scale.
So how did vector search make the jump from research into production—and where are vector databases headed next? In a recent episode of the English-language podcast Innovator Coffee, James Luan, VP of Engineering at Zilliz, shared his perspective, drawing on the company’s founding story, the thinking behind Milvus, Zilliz’s open-source vector database, and the design principles that shaped the system.
From Algorithms to Production: The Evolution of Vector Databases
Looking back on the early days before vector search became production-ready, James Luan notes that most early progress happened inside large technology companies. Projects like Meta’s FAISS set the technical foundation, but they were libraries—not databases. Similar vector search systems existed at companies such as Microsoft and Spotify, typically built for internal use and tailored to specific workloads. These tools were effective, but they were never designed to run as general-purpose, long-lived systems.
The turning point came when vector search moved from research into real products. Once teams tried to deploy it in production, system-level challenges became impossible to ignore. Scalability, reliability, and day-to-day operations mattered as much as search quality. Different paths emerged. Some teams built managed services optimized for online inference and tight integration with large language models. Others took a broader infrastructure approach, integrating vector search with data lakes and traditional databases to support enterprise-scale use cases. In James’s view, this divergence is a natural stage in the emergence of any new infrastructure layer.
As large language models matured and applications reached production, the role of vector databases expanded quickly. Early use cases focused on similarity-based retrieval—recommendation systems, image search, and content matching. Over the past two to three years, Retrieval-Augmented Generation (RAG) has become the dominant pattern. In RAG systems, vector databases provide models with relevant, grounded context, enabling fact retrieval and helping reduce hallucinations.
That role becomes even more important in agent-based systems. Here, vector databases act as long-term or near-line memory, supporting multi-step reasoning, context compression, and multimodal retrieval. James summarizes this shift with a simple principle: less structure, more intelligence. As model capabilities improve, rigid pipelines and heavy upfront labeling can hold systems back. Agents perform better when they operate in a flexible semantic space and decide dynamically how to retrieve and combine information.
At the same time, James stresses that vector databases are not magic. Retrieval quality depends as much on data governance as on algorithms. Well-curated, domain-relevant data—and continuous evaluation—are essential. Embedding models, rerankers, and retrieval strategies evolve quickly, and teams that go too long without reassessing their stack often fall behind.
Looking beyond inference, James sees vector databases playing a growing role in training and data preparation. As multimodal models become more common, vector search is increasingly used to clean, deduplicate, and curate large datasets across text, images, video, and PDFs. Over time, this may converge with data lakes into a “vector lake” architecture, connecting batch data processing with online inference.
In that longer-term view, vector databases are no longer just retrieval engines. They become a semantic layer that spans training, inference, and long-term data governance—supporting the full lifecycle of AI systems.
How Zilliz Found Its Direction Before Vector Databases Went Mainstream
James describes Zilliz’s early days as a period of exploration rather than immediate clarity. Both he and the company’s CEO came from traditional database backgrounds, having spent years building transactional systems at Oracle. From the outset, they knew they didn’t want to build another conventional database—but what that alternative should be was still an open question.
Their first attempt was a GPU-accelerated database aimed at speeding up large-scale data processing through specialized hardware. Technically, it worked. Commercially, it did not. GPUs delivered strong performance, but they were expensive, and for most real-world workloads the cost–performance tradeoff was hard to justify. At the same time, CPU-based systems like ClickHouse were improving rapidly, closing much of the performance gap at a fraction of the cost.
That experience forced a deeper rethink. Instead of asking how to make databases faster, the team began asking a different question: what kinds of data were still poorly served? Traditional analytics and transactional workloads already had mature solutions. What stood out was unstructured data—text, images, and other content that users increasingly wanted to search and understand, not just store.
The turning point came through user feedback. Some early users asked whether the system could be used to speed up image search. That question pointed to a broader opportunity: semantic similarity at scale, enabled by vector representations. The team realized that vectors—not GPUs—were the more fundamental abstraction. From that insight, Milvus was born as an open-source project focused on large-scale vector search.
James emphasizes that this pivot was not driven by hype. At the time, “vector databases” were not a recognized category, and even the term itself lacked a clear definition. What guided the decision was a conviction rooted in database fundamentals: if semantic search was going to matter, it would eventually need the same qualities as any critical data system—scalability, stability, and reliability.
That choice set the direction for everything that followed. By committing early to vectors as first-class data and to databases as long-running systems, Zilliz positioned itself ahead of the industry’s shift toward AI-driven applications—well before that shift became widely visible.
As models later moved from research into production, vector databases became a core part of enterprise AI architectures, supporting RAG pipelines, agent systems, multimodal retrieval, and large-scale training data deduplication. With that expansion came new expectations. Speed alone was no longer enough. Accuracy, scalability, cost efficiency, data governance, and security all became first-order concerns.
James’s takeaway is that building systems that balance these demands is not a short-term optimization problem. It requires patience, sustained engineering investment, and a long-term commitment to infrastructure fundamentals—well beyond the initial excitement of a new category.
Technical Challenges and Solutions: Running Vector Databases in Production
As vector databases moved into real production AI systems, James argues that success stopped being about raw performance. In early deployments, speed mattered above all else. But as large language models entered the picture, the real challenge became building systems that could scale sustainably—balancing cost, accuracy, reliability, and enterprise requirements at the same time.
Cost: Moving Beyond Memory-Only Search
James points out that early vector search systems relied heavily on in-memory indexes. That approach worked when datasets were small, but it became economically unsustainable as LLM-driven applications pushed data volumes much higher. At that scale, reducing latency by a few milliseconds matters far less than controlling storage costs.
The solution is a tiered approach to storage and indexing. By combining in-memory, disk-based, and object-storage indexes, vector databases can reduce storage costs by up to 100x. This shift doesn’t just optimize existing workloads—it makes large-scale retrieval practical in the first place.
Scalability and Stability at Real-World Scale
Cost pressures quickly expose scalability limits. James notes that many teams begin with simple, single-node setups because they are easy to deploy. Problems emerge later, when data grows 10x, 50x, or even 100x in a short time.
This reality led Zilliz to rebuild Milvus as a distributed, cloud-native system. For James, scalability is inseparable from stability. A system that can scale but fails unpredictably under real workloads is not usable infrastructure.
He emphasizes that stability is often the hardest part of turning vector search into a production system. With existing open-source tools, many teams can build a working prototype in six to twelve months. What’s difficult is making that system behave reliably over long periods of time as data volume, query patterns, and operational complexity change.
Unlike performance optimization, stability does not come from a single breakthrough. Performance gains are visible—benchmarks can show a 20% or 30% improvement in a few months. Stability is built differently. Each fix may improve the SLA by only a fraction of a percent, barely noticeable on its own. But through hundreds of small, cumulative improvements, a system gradually becomes reliable enough to run as long-term infrastructure.
Accuracy: Retrieval Sets the Ceiling
In RAG and agent systems, retrieval quality directly determines model performance. If the system fails to retrieve the right information, the model has no way to compensate.
James stresses that accuracy is not just a database concern. It depends on the entire retrieval stack, including embedding models, reranking strategies, and data quality. Because these components evolve rapidly, teams need to re-evaluate their setups frequently—often every few months—to maintain accuracy over time.
How Zilliz Balances Open Source and Business
Over the past year or two, James has spent a lot of time thinking about a challenge that comes up repeatedly for open-source companies: how to build and sustain an active open-source community while also running a growing business.
Open source and commercial goals do not always align neatly. Open-source projects depend on openness, long-term participation, and community trust, while a company has revenue targets and growth constraints to manage. In recent years, this mismatch has become more visible across the industry. James has seen several teams move their open-source projects into maintenance mode—not because the technology stopped working, but because the open-source model became difficult to support as the business scaled.
But for Zilliz, open source is not just a technical choice—it is also a go-to-market decision. In practice, it works much like a highly technical free trial—a way for developers to discover, evaluate, and gain confidence in a product through real use. This is especially important for startups, where acquiring early users is hard. For an engineering-driven team like Zilliz, this proved far more effective than traditional marketing or sales-led approaches.
By open-sourcing Milvus, the team focused on GitHub as the primary entry point. Developers used Milvus in real workloads, shared feedback, and contributed improvements back to the project. Over time, this created a tight feedback loop between users, the community, and product development.
The results were tangible. James notes that roughly 80% of Zilliz Cloud customers began as users of the open-source Milvus project. Open source also served as a powerful trust mechanism—teams that had run Milvus themselves were far more comfortable adopting a commercial offering later.
That transition, however, was never automatic. James is clear that open source alone does not create a business. A commercial product must do more than package open source—it has to solve problems the open-source version does not. For Zilliz, that value lies in operating Milvus reliably at scale—managing upgrades, handling failures, and continuously optimizing performance and cost.
An important outcome of this approach is that many users find their overall costs decrease after moving to the managed offering. Vector databases evolve quickly, driven by advances in indexing, quantization, and storage. With Zilliz Cloud, users benefit from these improvements continuously, without taking on the burden of upgrades or infrastructure management themselves.
From James’s perspective, this balance is what makes the model sustainable. Open source creates access and trust. The commercial offering turns long-term infrastructure progress into practical value—without undermining the openness that drew users in to begin with.
How Zilliz Stands Out in a Crowded Market
When asked about how the vector database market looks today, James acknowledges that the rapid adoption of AI has quickly crowded the space. It now includes managed services, lightweight plugins, and a growing number of new entrants offering vector search capabilities. On the surface, many of these solutions appear similar.
In James’s view, the real distinction is not defined by feature checklists, but by the depth and maturity of the underlying systems. Building a basic vector search function is relatively straightforward. Building a system that can operate reliably at scale, over long periods of time, is not.
Systems Maturity
Zilliz’s advantage starts with systems maturity—specifically scalability, stability, and cost control. From the beginning, Milvus was designed as a distributed, Kubernetes-native database, built to remain stable as data volumes grow by tens of times. This matters because vector workloads rarely scale smoothly. Systems that perform well at small scale often struggle once usage becomes sustained, spiky, and unpredictable.
Cost is part of that maturity as well. Zilliz invested early in multi-tier indexing, combining memory, disk, and object storage. This gives users practical flexibility to balance performance and cost as workloads evolve, rather than locking them into a single, expensive operating mode.
Enterprise Readiness
Enterprise readiness is another key differentiator. James contrasts Zilliz with teams that come primarily from model- or AI-centric backgrounds. Zilliz’s roots in traditional database engineering led to early investment in capabilities such as access control, data isolation, BYOC deployments, encryption, and compliance.
These features are not optional at enterprise scale. They are what allow vector databases to move beyond developer experimentation and into regulated environments such as finance, healthcare, and large organizations with strict security and governance requirements.
Operational Reliability
James notes that many teams underestimate the long-term operational complexity of vector databases. Early systems may work well in controlled setups, but the real challenges appear once data grows quickly, concurrency increases, and AI applications move into continuous production.
Most companies do not want to invest their time and resources in operating complex infrastructure, especially in highly specialized areas like vector search. This is where James sees Zilliz’s role: taking on the operational burden of running vector databases at scale, so teams can focus on building applications rather than maintaining infrastructure. As the market matures, this division of labor becomes increasingly important.
Looking Ahead: The Next Phase of Vector Databases
Looking three to five years ahead, James takes a pragmatic view of where the industry is heading. Growth will continue, but the key question will no longer be whether vector database systems can be built—it will be whether they can be operated sustainably. As models grow larger and AI applications move deeper into production, data volumes will expand rapidly, raising the bar for cost control, reliability, accuracy, and security.
In that environment, the ability to reduce costs by an order of magnitude without compromising retrieval quality becomes a defining benchmark. James believes this is where durable advantages are formed. Long-term leaders in the vector database space will not be decided by features or hype, but by infrastructure discipline—the ability to run large-scale systems efficiently, reliably, and over time.
To hear the full discussion, you can find the episode on Spotify, Apple Podcasts, and YouTube.
- From Algorithms to Production: The Evolution of Vector Databases
- How Zilliz Found Its Direction Before Vector Databases Went Mainstream
- Technical Challenges and Solutions: Running Vector Databases in Production
- How Zilliz Balances Open Source and Business
- How Zilliz Stands Out in a Crowded Market
- Looking Ahead: The Next Phase of Vector Databases
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for FreeKeep Reading

DeepSeek-OCR Explained: Optical Compression for Scalable Long-Context and RAG Systems
Discover how DeepSeek-OCR uses visual tokens and Contexts Optical Compression to boost long-context LLM efficiency and reshape RAG performance.

How to Build an Enterprise-Ready RAG Pipeline on AWS with Bedrock, Zilliz Cloud, and LangChain
Build production-ready enterprise RAG with AWS Bedrock, Nova models, Zilliz Cloud, and LangChain. Complete tutorial with deployable code.

Introducing DeepSearcher: A Local Open Source Deep Research
In contrast to OpenAI’s Deep Research, this example ran locally, using only open-source models and tools like Milvus and LangChain.
