TrialHub Enhances Clinical Trial Intelligence with Zilliz Cloud

250M+
Vectors
High-Performance
Retrieval at Scale
Cost-Efficient
Serverless Deployment in Production
Flexible Infrastructure
to Support Future Growth
Milvus scaled really well with batches ranging from 1,000 to millions of records. That really impressed me.
Todor Voynikov
About TrialHub
TrialHub is a data intelligence platform dedicated to optimizing clinical trials and making them more accessible and efficient. The platform equips trial sponsors and clinical research organizations with powerful insights into past clinical trials, country-specific drug reimbursement landscapes and patient treatment pathways pulling data from over 80,000 sources, including PubMed. One of its key offerings is "IQ," a Retrieval-Augmented Generation (RAG) tool that enables clients to ask natural language questions about trials and patients to inform new study designs and operational strategies.
The Challenge: Building a Scalable and Reliable RAG System
When Todor Voynikov, Data Engineer at TrialHub, joined the team, he was tasked with building a robust RAG system from scratch. With no prior experience in RAG or vector databases, he quickly dove into researching the architecture. He evaluated multiple vector databases, including Pinecone, Qdrant, Milvus, and others, for their ability to handle large-scale retrieval tasks.
The stakes were high: TrialHub needed to process and retrieve insights from massive datasets—potentially up to a billion vectors—with strict reliability and relevance requirements. Text came from structured and unstructured sources, including parsed PDFs with complex formatting.
The Journey to Zilliz Cloud
Todor began by running his own custom benchmarks on real data, evaluating multiple vector database solutions for performance, scalability, and retrieval accuracy. While other platforms were comparable in certain areas, Milvus stood out for retrieval performance at scale.
"Milvus scaled really well with batches ranging from 1,000 to millions of records. That really impressed me," said Todor. "The performance difference was significant, especially in retrieval tasks."
After confirming the results with internal tests and sharing them with the rest of the team at TrialHub, Todor decided to move forward with Zilliz Cloud, the hosted version of Milvus.
Why TrialHub Chose Zilliz Cloud
Scalable Retrieval Performance: Zilliz Cloud delivered consistently fast retrievals even as vector volumes scaled into the hundreds of millions.
Custom Benchmark Validation: Todor developed a tailored benchmarking process with TrialHub's medical data to validate Vector DB performance before committing.
Serverless Production-Ready: Despite being typically used for prototyping, Zilliz Cloud's serverless tier is powering TrialHub's production RAG system with minimal issues.
Ease of Use & Stability: The Python client and API enabled a smooth integration with TrialHub's LangChain-based stack, while support from the Zilliz team ensured stability.
How TrialHub Uses Zilliz Cloud
TrialHub's RAG system supports pharmaceutical companies in designing more successful clinical trials. Through integration with LangChain and the ChatGPT API, the system allows users to query curated sources like PubMed. Embeddings are generated using domain-specific medical models retrained from BERT, optimized for clinical data. These embeddings are stored and queried in Zilliz Cloud to enable fast, relevant retrieval.
Today, TrialHub's system manages over 250 million vectors. Retrieval performance is critical for success, and Milvus’ ability to maintain low-latency responses across growing datasets is a major reason the team continues to rely on Zilliz Cloud.
Future Plans
As the team adds new data sources and scales the RAG system further, TrialHub expects vector volumes to increase substantially. The team is exploring deduplication of embeddings and looks forward to upcoming features in Milvus 2.6 that simplify this process. Additionally, the engineering team is considering migrating to a dedicated tier for more control as system demands grow.
Conclusion
TrialHub's experience underscores how a purpose-built vector database like Zilliz Cloud can support mission-critical AI applications in regulated industries. From benchmark-driven adoption to serverless production deployment, Zilliz Cloud has helped TrialHub deliver a smarter, faster, and more scalable solution for clinical trial optimization.