Blog
Milvus Reference Architectures

Milvus Reference Architectures

May 09, 20245 min read

This blog addresses some commonly asked questions regarding Milvus resource allocation based on specific use cases. Those questions include:

How much CPU and memory resources are needed for Milvus, based on a specific number of users or requests per second (RPS)?
How much CPU and memory resources are needed for Milvus, based on different mixes of READ and WRITE?

Understanding Your Workload Characteristics

The first step in allocating resources to Milvus is to understand your workload characteristics. These factors play a crucial role in determining Milvus' computational power and memory requirements.

Below is an example list of Linux package-based reference architectures, where RPS means Requests Per Second:

Up to 20 RPS or 1,000 users API: 20 RPS, Web: 2 RPS, Git (Pull): 2 RPS, Git (Push): 1 RPS
Up to 40 RPS or 2,000 users API: 40 RPS, Web: 4 RPS, Git (Pull): 4 RPS, Git (Push): 1 RPS
Up to 60 RPS or 3,000 users API: 60 RPS, Web: 6 RPS, Git (Pull): 6 RPS, Git (Push): 1 RPS
Up to 100 RPS or 5,000 users API: 100 RPS, Web: 10 RPS, Git (Pull): 10 RPS, Git (Push): 2 RPS
Up to 200 RPS or 10,000 users API: 200 RPS, Web: 20 RPS, Git (Pull): 20 RPS, Git (Push): 4 RPS
Up to 500 RPS or 25,000 users API: 500 RPS, Web: 50 RPS, Git (Pull): 50 RPS, Git (Push): 10 RPS
Up to 1000 RPS or 50,000 users API: 1000 RPS, Web: 100 RPS, Git (Pull): 100 RPS, Git (Push): 20 RPS

Estimating Resource Requirements

To estimate the resource requirements for Milvus, we need to make a few assumptions:

Reads: Each web request and Git pull is a READ operation.
Writes: Each Git push is considered a WRITE operation.
Volume and ratio of reads/writes: Milvus are assumed to be the same as the read/write ratio of API calls per number of users.
Queries per second (QPS): must match the API RPS (requests per second) requirement per number of users.

We also need to estimate the data size per read/write request. We’ll assume a common GenAI use case:

Vector dimension: 1024 floating point numbers
Size in bytes per floating point number: 4 KB
Top_k (number of returned vectors): 10 vectors per search request
Size of a collection (database table): 1 million vectors per write request
Database index type: HNSW

Based on these assumptions, we can do back-of-napkin math to estimate the data size per read or write. Assume the vector dimension is 1024, and each vector takes 1024 * 4 bytes = 4 KB. Assume a typical top_k = 10 vectors per read. With these assumptions:

Each Milvus read operation processes around 40 KB of data.
Each Milvus write operation is estimated to involve 40 MB of data.

Milvus offers both insert (create an entirely new collection) and upsert (modify a few rows) functionalities (See the blog Milvus insert, upsert, delete for more information). We’ll over-estimate each WRITE operation to be a whole collection insert rather than just a few rows upserts.

Databases need to consider not only data size but search and insert speed. We will assume the collection is indexed using the popular HNSW index, which has Big-O notation, O(log n), search time.

With these assumptions, here is our conversion of Web-based users/RPS/reads/writes to Vector Database QPS/data size architecture tiers:

Up to 1,000 users = 20 QPS with 1 million vectors
Up to 2,000 users = 40 QPS with 1 million vectors
Up to 3,000 users = 60 QPS with 1 million vectors
Up to 5,000 users = 100 QPS with 2 million vectors
Up to 10,000 users = 200 QPS with 4 million vectors
Up to 25,000 users = 500 QPS with 10 million vectors
Up to 50,000 users = 1000 QPS with 20 million vectors

Load Testing and Benchmarking

To ensure the accuracy of our resource estimations, we load-tested and benchmarked the architecture tiers on VectorDBBench. We assumed the default Segment, Partition, Shard, Data node, Query node, and Index node sizes for the Milvus architecture itself.

Due to Milvus's autoscaling capabilities, performance is linear with the data size and cluster resources! Below is a table showing the recommended Milvus and Zilliz Cloud (the fully managed Milvus) resource sizes for different data capacities and QPS requirements.

The table below shows data capacity in millions of 1024_dimension vectors. Milvus resources are given in several CPUs and GB of memory. For cost comparison, we show Zilliz Cloud resource sizes, given in Compute Units (cu), in either performance or capacity types.

Table of recommended Milvus and Zilliz resource sizes per User/RPS tiers


Users	Data capacity	QPS benchmarked	RPS required	Milvus resource	Zilliz resource
3,000	1m_1024d vectors	1200	60	8CPU, 32G	1cu-perf
3,000	1m_1024d vectors	2400	60	16CPU, 64G	2cu-perf
3,000	1m_1024d vectors	3600	60	24CPU, 96G	4cu-perf
10,000	3.7m_1024d vectors	360	200	16CPU, 64G	2cu-cap
10,000	3.7m_1024d vectors	700	200	64CPU, 256G	4cu-cap
25,000	10m_1024d vectors	600	500	196CPU, 768G	12cu- cap
250,000	100m_1024d vectors	6000	5000	19200CPU, 76800G	1200cu- cap

Table of recommended Milvus and Zilliz resource sizes per number of Users/RPS tiers. Milvus scaling is linear with respect to data size and required QPS.

From the table above, when data size and QPS need to reach a certain threshold, it could be more cost-effective to run Milvus from Zilliz Cloud instead of on-premises.

Conclusion

By understanding your workload characteristics, estimating resource requirements based on assumptions, and leveraging load testing and benchmarking tools such as VectorDBBench, you can confidently provision the necessary resources for your Milvus deployment.

Refer to our cluster sizing guide for a deeper dive. Remember, as your workload evolves, it's essential to regularly review and adjust your resource allocation to maintain peak performance.

References

HNSW: https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md
Milvus architecture: https://docs.gitlab.com/ee/administration/reference_architectures/
Blog Milvus Packaging Dependencies: https://zilliz.com/blog/Milvus-server-docker-installation-and-packaging-dependencies
Blog Milvus Sizing Tool: https://medium.com/@zilliz_learn/demystifying-the-milvus-sizing-tool-2c0afe7fe963
Shards, Partitions, Segments: https://zilliz.com/blog/sharding-partitioning-segments-get-most-from-your-database
Zilliz Cloud CU types: https://docs.zilliz.com/docs/cu-types-explained#evaluate-performance
VectorDBBench Tool for Benchmarking Milvus, Zilliz Cloud, and many other maintream vector databases

Updated on Jun 01, 2025

Christy Bergman
Christy Bergman is a passionate Developer Advocate at Zilliz. She previously worked in distributed computing at Anyscale and as a Specialist AI/ML Solutions Architect at AWS. Christy studied applied math, is a self-taught coder, and has published papers, including one with ACM Recsys. She enjoys hiking and bird watching.

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

Empowering Innovation: Highlights from the Women in AI RAG Hackathon

Over the course of the day, teams built working RAG-powered applications using the Milvus vector database—many of them solving real-world problems in healthcare, legal access, sustainability, and more—all within just a few hours.

Advancing LLMs: Exploring Native, Advanced, and Modular RAG Approaches

This post explores the key components of RAG, its evolution, technical implementation, evaluation methods, and potential for real-world applications.

GLiNER: Generalist Model for Named Entity Recognition Using Bidirectional Transformer

GLiNER is an open-source NER model using a bidirectional transformer encoder.