Blog
How to Select the Most Appropriate CU Type and Size for Your Business?

How to Select the Most Appropriate CU Type and Size for Your Business?

Nov 21, 20246 min read

A Compute Unit (CU) in Zilliz Cloud refers to hardware resources catering to search requests and indexes. Zilliz Cloud provides three types of CUs: Performance-optimized, Capacity-optimized, and Extended-Capacity CU. Each CU type comprises different CPU, memory, and storage resource combinations for different business needs. Therefore, selecting the appropriate CU options and sizes is crucial while configuring a Zilliz Cloud cluster.

Performance-optimized CU

A performance-optimized CU is ideal for similarity retrieval tasks that require a rapid response time in milliseconds with a high throughput of at least 100 queries per second (QPS). Each CU can handle about 1.5 million 768-dim vectors.

This CU type is essential for (but not limited to) the following use cases:

Generative AI applications
Recommender systems
Search engines
Chatbots
Content moderation
Augmenting LLMs' knowledge base
Anti-fraud systems

Capacity-optimized CU

If your application handles tens of millions of vectors, consider using a capacity-optimized CU. Each CU can handle about 5 million 768-dim vectors. This type of CU can store much more data than a performance-optimized CU with a lower cost but also a lower performance.

Capacity-optimized CUs are particularly useful for (but not limited to) the following scenarios:

Searching through large-scale unstructured data, such as text, images, videos, and molecular structures
Detecting copyright violations
Verifying identities.

Extended-Capacity CU

The Extended-Capacity CU is ideal if you are not concerned with response time and have a very tight budget. Each CU can handle 20 million 768-dimensional vectors with a relatively reasonable price. While it has a higher search latency, it can hold up to 4 times more data as a capacity-optimized CU.

This type of CU is perfect for offline tasks such as:

Data labeling or clustering
Deduplication
Dataset outlier detection or class balancing.

Evaluating three CU types

The table below provides an overview of the differences among Zilliz Cloud’s CU types.

CU Type	Latency	Throughput	Capacity	Cost per Million Vectors (note: based on 768-dim vectors)
Performance-optimized	Low	High	Low	Starting from $65/month
Capacity-optimized	Medium	Medium	Medium	Starting from $20/month
Extended-Capacity CU	High	Low	High	Starting from $10/month

Performance comparison

To measure the performance of different CU options, we looked at two key indicators: search latency and throughput. We tested Zilliz Cloud's three types of CUs using two datasets with various topk values (10, 100, 250, 1000). The first dataset consists of 1,000,000 vectors with 768 dimensions, and the second has 5,000,000 vectors with the same dimension.

top_k	/	/	10	100	250	1000
Latency	Performance-optimized CU	1M 768dim	<10ms	<10ms	<10ms	10-20ms
	Capacity-optimized CU	5M 768dim	<50ms	<50ms	<50ms	50-100ms
	Extended-Capacity CU

The table above shows that the performance-optimized CU is the best choice for low latency, outperforming the capacity-optimized CU. It maintains a latency of under ten milliseconds for typical topk values of 10-250, five to ten times faster than the capacity-optimized. When dealing with topk values in the thousands, the latency for each CU type varies from 10-20 ms for the performance-optimized CU and 50-100 ms for the capacity-optimized CU. However, it's worth noting that even though the performance-optimized CU slows down in responses when conducting tasks withtopk values in the thousands, its search latency is still suitable for many real-time applications.

top_k			10	100	250	1000
QPS	Performance-optimized cu	1M 768dim	520	440	270	150
	Capacity-optimized CU	5M 768dim	100	80	60	40
	Extended-Capacity CU

When it comes to throughput, the performance-optimized CU is superior. It outperforms the capacity-optimized CU by four to five times.

Capacity comparison

We tested Zilliz Cloud's three types of CUs using a standard set of vector dimensions: 128, 256, 512, 768, and 1024.

Vector dimensions	Number of vectors per CU (Millions)	Number of vectors per CU (Millions)	Number of vectors per CU (Millions)
/	Performance-optimized CU	Capacity-optimized CU	Extended-Capacity CU
128	5	25	Coming soon
256	2.96	14.87	Coming soon
512	1.63	8.22	Coming soon
768	1.5	5	20
1024	0.86	4.34	Coming soon

Based on the testing result in the table above, we find that:

The Extended-Capacity CUs have the largest capacities in storing 768-dimensional vectors, 13 times and 4 times larger than the performance-optimized and capacity-optimized CUs respectively.
As the vector dimensions increase, more storage space is needed to hold the data. For instance, a CU can store roughly twice the number of 512-dimensional vectors compared to 1024-dimensional vectors.

Note: This experiment only focused on the primary key and vectors without adding scalar fields. However, if there are additional scalar fields like id, label, keywords, summary, URL, etc., the actual capacity of each CU type may differ from the table above. Therefore, it's essential to rely on empirical measurement for accuracy.

Let’s look at some examples!

We have compared Zilliz Cloud's three CU options by looking through the lens of latency, throughput, capacity, and cost. But how do you choose the most suitable option for your business? Let's look at two examples to help you make the right choice.

Example 1

Suppose you’re building an LLM-augmented chatbot that adopts Zilliz Cloud to store more than 10 million text chunks of private documents with a 768-dimension embedding vector. Your application requires Zilliz Cloud to support 1,000 QPS and retrieve the top 10 results with an end-to-end latency of less than 30 milliseconds.

A performance-optimized CU is the only way to achieve a latency under 30ms. Since each performance-optimized CU can hold up to 1.5 million 768-dimensional vectors, you'll need at least seven CUs to handle all 10 million vectors. One CU can reach a peak QPS of 520 for throughput when the topk value is 10. To take 1,000 QPS, you'll need two replicas.

Therefore, the best approach for this scenario is to use two replicas of a performance-optimized CU, each containing seven CUs.

Example 2

Suppose your application detects copyright violations in images and needs to find similar ones from a pool of 100 million. Each image is embedded into a 768-dimensional vector. You don’t require real-time responses but expect the top 100 results with a throughput of 50 QPS.

Both capacity-optimized and performance-optimized CU can handle 50 requests per second when you retrieve the top 100 results. However, the capacity-optimized CU can store three times more vectors than the performance-optimized CU. Therefore, the capacity-optimized CU is the more suitable option for your needs.

Based on the test results, a single capacity-optimized CU can store up to 5.6 million 768-dimensional vectors. To accommodate your 100 million vectors, you will require a minimum of 20 CUs. When the topk value is 100, a single CU can reach a peak QPS of 80 for throughput. For 50 QPS, one replica is enough. Therefore, you'll need a cluster with 20 capacity-optimized CUs.

Summary

Zilliz Cloud offers three types of CUs. If you need your application to be lightning-fast and responsive in real-time, the performance-optimized CU is the way to go. The capacity-optimized CU is the best choice for applications that require storing and retrieving tens of millions of vectors. If you're on a tight budget and okay with sacrificing speed and throughput, the Extended-Capacity CU is perfect for you.

Getting started with Zilliz Cloud

Explore with our free tier (no credit card required), or try our 30-day enterprise trial with up to $200 in credits. Subscribe through any cloud marketplace and receive an additional $100 credit.

Dive deeper into the Zilliz Cloud documentation.

Updated on Mar 31, 2025

Robert Guo
Robert Guo is a Partner and Director of Product Management at Zilliz and one of the architects behind Milvus, an open-source vector database revolutionizing AI data analysis. With a Ph.D. in Computer Software and Theory from Huazhong University of Science and Technology, he has presented influential work at prestigious conferences and journals, including SIGMOD, VLDB, USENIX ATC, ICS, DATE, and IEEE TPDS. Previously a key developer for Huawei's ModelArts platform, Robert is skilled at crafting efficient and scalable AI data solutions.

Content

Start Free, Scale Easily

Try the fully-managed vector database built for your GenAI applications.

Try Zilliz Cloud for Free

Share this article

Keep Reading

How AI and Vector Databases Are Transforming the Consumer and Retail Sector

AI and vector databases are transforming retail, enhancing personalization, search, customer service, and operations. Discover how Zilliz Cloud helps drive growth and innovation.

AI Video Editing Software: Revolutionizing Video Tech Through Intelligent Search and Automation

Learn how to build AI-powered video editing tools using CLIP, ResNet, and vector databases. Discover implementation steps for intelligent search, automated tagging, and scalable video processing.

Elasticsearch Was Great, But Vector Databases Are the Future

Purpose-built vector databases outperform dual-system setups by unifying Sparse-BM25 and semantic search in a single, efficient implementation.