C12.ai Accelerates Drug Discovery with Milvus Vector Database

10× Faster Searches
Instant reaction retrieval, reducing query time from minutes to seconds.
Seamless Scalability
Effortlessly handles millions of reactions and growing workloads.
Superior Relevance
Delivers high-quality, chemically practical reaction suggestions.
Enhanced User Experience
Boosted platform adoption with faster, smarter retrosynthesis.
About C12.ai
Founded in 2022, C12.ai is transforming pharmaceutical research and development labs by combining cutting-edge AI with embodied intelligence technologies. Its mission is to help laboratories move beyond traditional automation, embedding intelligent decision-making into lab workflows to reduce manual bottlenecks, enhance efficiency, and lower operational costs. Through innovations like real-time insights and more intelligent automation, C12.ai is leading a new era of intelligent pharmaceutical R&D.
A core focus of C12.ai is retrosynthetic analysis—a critical technique in drug development and organic synthesis chemistry. By deconstructing complex molecules into simpler precursors and designing synthetic pathways, chemists can accelerate the discovery of new drugs. C12.ai enhances this process by leveraging historical chemical reaction data and intelligent retrieval systems to facilitate faster and more effective synthesis planning.
The Challenge: Complex Retrosynthetic Route Design
In retrosynthetic route design, C12.ai faced several key challenges:
1. Managing Massive Reaction Databases
Chemistry databases contain tens or hundreds of millions of reaction records. Finding the handful of precedents most relevant to a specific transformation requires sophisticated search capabilities that traditional databases simply cannot provide.
2. Computing High-Dimensional Similarity Searches Efficiently
Modern chemical fingerprinting techniques, such as Extended Connectivity Fingerprints (ECFP), translate molecular structures into high-dimensional vectors with hundreds or thousands of dimensions. Traditional database systems lack the specialized indexing required to calculate similarities across these complex vectors at scale.
3. Enabling Real-Time Interactive Design
Effective retrosynthetic design is an iterative, interactive process. Chemists need to rapidly explore multiple pathways, evaluate alternatives, and receive immediate feedback on each proposed route. This demands a system that can deliver sub-second response times consistently.
4. Ensuring Chemical Relevance and Practicality
Pure mathematical similarity isn't enough—retrieved reactions must align with specific chemical properties and reaction conditions to be truly useful. The system must blend raw similarity searches with expert rules on mechanisms, yields, and practical applicability.
To deliver a platform that could support real-time, scalable, and highly accurate retrosynthetic design, C12.ai needed a new kind of solution.
The Solution: Vector Search with Milvus
After evaluating several options, C12.ai selected Milvus as the foundation for their similar-reaction search engine. This choice was driven by several key advantages that make Milvus particularly well-suited for chemical similarity search:
Why C12.ai Chose Milvus
Ultra-Fast, Accurate Vector Search: Milvus utilizes state-of-the-art indexing techniques, including IVF and HNSW, which partition the vector space and apply quantization to reduce search latencies significantly. This architecture enables millisecond-level response times across collections containing hundreds of millions of vectors, precisely what is needed for interactive retrosynthetic design.
Elastic, Distributed Architecture: Deployed in containers on Kubernetes, Milvus scales horizontally with ease. Data is automatically sharded and replicated across nodes, providing both performance scalability and high availability. This infrastructure adapts dynamically to changing workloads and growing datasets.
Seamless Integration into Existing Systems: With comprehensive SDKs for Python, Java, and other languages, Milvus integrates smoothly into C12.ai's existing cheminformatics workflows. This allowed the team to implement advanced vector search without rebuilding their entire technology stack.
Cost-Effective Scaling: By optimizing resource utilization and supporting dynamic scaling, Milvus significantly reduces infrastructure costs compared to monolithic or traditional database solutions—an important consideration for processing the ever-growing volume of chemical reaction data.
How Milvus Powers C12.ai's Platform
C12.ai has implemented a comprehensive workflow that leverages Milvus at every stage of similar-reaction retrieval.
1. Vectorizing Chemical Reaction Data
Each reaction in C12.ai's database is encoded into high-dimensional vector embeddings using specialized chemical fingerprinting algorithms. These embeddings capture the essential characteristics of reactants, products, catalysts, solvents, and reaction conditions, creating a mathematical representation that can be efficiently processed by Milvus.
2. Building Optimized Search Indexes
The implementation utilizes Milvus's IVF (Inverted File Index) structure, which partitions the vector space into clusters and uses quantized centroids to approximate data points. This approach dramatically accelerates search performance by limiting the scope of exact comparisons to the most promising clusters.
3. Distributing Workloads for Scale and Resilience
C12.ai's Milvus deployment runs on a Kubernetes-based cluster, enabling parallel processing across multiple compute nodes. This containerized architecture scales out seamlessly under heavy loads and provides robust fault tolerance through automatic replication and failover.
4. Enhancing Results with Domain-Specific Filters
Raw vector similarity results are further refined through C12.ai's proprietary chemical knowledge rules. Retrieved reactions are scored based on condition compatibility, reported yields, and practical applicability in synthesis contexts. This hybrid approach ensures that chemists receive not just structurally similar reactions, but ones that are genuinely useful for their specific synthetic challenges.
Workflow Overview
As shown in the diagram below, there are two parallel workflows in the system: one for preparing the reaction library, and another for real-time query processing.
How Milvus powers C12.ai's platform
Workflow 1: Reaction Library Preparation and Vectorization: C12.ai first processes its entire chemical reaction database by vectorizing each reaction equation, capturing essential molecular features such as reactants, catalysts, solvents, and conditions. These vectors are then imported into Milvus, where efficient indexes like IVF are built. This preparation stage ensures that millions of reactions can be searched quickly and accurately when needed.
Workflow 2: Real-Time Query Processing: When a target reaction is input, the system vectorizes the input in the same format and performs a similarity search in Milvus to retrieve the top-K closest reactions. The initial results are then reranked through domain-specific rules, considering reaction conditions, yields, and practical applicability. After re-ranking, the system fetches detailed information and presents chemists with high-quality, actionable synthesis options in real time.
Implementation Results and Benefits
Since integrating Milvus into their retrosynthetic design platform, C12.ai has achieved remarkable improvements across multiple dimensions:
10× Faster Retrieval
Search times have been reduced from minutes to seconds, even when querying databases containing millions of reaction entries. This dramatic speed improvement enables truly interactive design workflows where chemists can rapidly iterate on synthetic routes.
Seamless Scalability
The distributed Milvus deployment easily accommodates growing data volumes and peak query loads. As C12.ai continuously expands its reaction databases with new literature and experimental data, the system maintains consistent performance without requiring major architectural changes.
Superior Result Relevance
By combining vector search with chemical-domain filtering, the platform delivers suggestions that align both structurally and contextually with target transformations. This higher relevance directly translates to more successful syntheses and fewer failed experiments in the lab.
Enhanced User Experience
The combination of rapid response times and high-quality matches has significantly improved user satisfaction. Chemists can now explore synthetic options more thoroughly and make more confident decisions, streamlining the entire drug development process.
Conclusion
The partnership between C12.ai and Milvus demonstrates how specialized vector database technology can transform complex scientific workflows. By addressing the dual challenges of massive data scale and high-dimensional computation, Milvus has enabled C12.ai to build a retrosynthetic design platform that delivers unprecedented speed, accuracy, and usability.
For pharmaceutical companies facing intense pressure to reduce development timelines and costs, this technology provides a powerful competitive advantage—allowing them to design more efficient syntheses, explore more chemical space, and ultimately bring life-saving drugs to market faster.