How to Select the Most Appropriate CU Type and Size for Your Business?
A Compute Unit (CU) in Zilliz Cloud refers to hardware resources catering to search requests and indexes. Zilliz Cloud provides three types of CUs: Performance-optimized, Capacity-optimized, and Cost-optimized. Each CU type comprises different CPU, memory, and storage resource combinations for different business needs. Therefore, selecting the appropriate CU options and sizes is crucial while configuring a Zilliz Cloud cluster.
A performance-optimized CU is ideal for similarity retrieval tasks that require a rapid response time in milliseconds with a high throughput of at least 100 queries per second (QPS). This CU type is essential for (but not limited to) the following use cases:
- Generative AI applications
- Recommender systems
- Search engines
- Content moderation
- Augmenting LLMs' knowledge base
- Anti-fraud systems
If your application handles tens of millions of vectors, consider using a capacity-optimized CU. This type of CU can store up to five times more data than a performance-optimized CU but with lower search performance. Capacity-optimized CUs are particularly useful for (but not limited to) the following scenarios:
- Searching through large-scale unstructured data, such as text, images, videos, and molecular structures
- Detecting copyright violations
- Verifying identities.
The cost-optimized CU is ideal if you are not concerned with response time and has a very tight budget. While it has a higher search latency, it can hold the same amount of data as a capacity-optimized CU. This type of CU is perfect for offline tasks such as:
- Data labeling or clustering
- Dataset outlier detection or class balancing.
Evaluating three CU types
The table below provides an overview of the differences among Zilliz Cloud’s CU types.
To measure the performance of different CU options, we looked at two key indicators: search latency and throughput. We tested Zilliz Cloud's three types of CUs using two datasets with various
topk values (10, 100, 250, 1000). The first dataset consists of 1,000,000 vectors with 768 dimensions, and the second has 5,000,000 vectors with the same dimension.
The table above shows that the performance-optimized CU is the best choice for low latency, outperforming the other two CU types. It maintains a latency of under ten milliseconds for typical
topk values of 10-250, five to ten times faster than the capacity-optimized and cost-optimized CUs. When dealing with
topk values in the thousands, the latency for each CU type varies from 10-20 ms for the performance-optimized CU, 50-100 ms for the capacity-optimized CU, to 100-200 ms for the cost-optimized CU. However, it's worth noting that even though the performance-optimized CU slows down in responses when conducting tasks with
topk values in the thousands, its search latency is still suitable for many real-time applications.
Table 3: Throughput performance testing results
When it comes to throughput, the performance-optimized CU is superior. It outperforms the capacity-optimized CU by four to five times and surpasses the cost-optimized CU by 15 to 18 times.
We tested Zilliz Cloud's three types of CUs using a standard set of vector dimensions: 128, 256, 512, 768, and 1024.
Based on the testing result in the table above, we find that:
- The capacity-optimized and cost-optimized CUs have equal capacities, five times larger than the performance-optimized CU.
- As the vector dimensions increase, more storage space is needed to hold the data. For instance, a CU can store roughly twice the number of 512-dimensional vectors compared to 1024-dimensional vectors.
Note: This experiment only focused on the primary key and vectors without adding scalar fields. However, if there are additional scalar fields like id, label, keywords, summary, URL, etc., the actual capacity of each CU type may differ from the table above. Therefore, it's essential to rely on empirical measurement for accuracy.
Let’s look at some examples!
We have compared Zilliz Cloud's three CU options by looking through the lens of latency, throughput, capacity, and cost. But how do you choose the most suitable option for your business? Let's look at two examples to help you make the right choice.
Suppose you’re building an LLM-augmented chatbot that adopts Zilliz Cloud to store more than 10 million text chunks of private documents with a 768-dimension embedding vector. Your application requires Zilliz Cloud to support 1,000 QPS and retrieve the top 10 results with an end-to-end latency of less than 30 milliseconds.
A performance-optimized CU is the only way to achieve a latency under 30ms. Since each performance-optimized CU can hold up to 1.2 million 768-dimensional vectors, you'll need at least nine CUs to handle all 10 million vectors. One CU can reach a peak QPS of 520 for throughput when the
topk value is 10. To take 1,000 QPS, you'll need two replicas.
Therefore, the best approach for this scenario is to use two replicas of a performance-optimized CU, each containing nine CUs.
Suppose your application detects copyright violations in images and needs to find similar ones from a pool of 100 million. Each image is embedded into a 768-dimensional vector. You don’t require real-time responses but expect the top 100 results with a throughput of 50 QPS.
Both capacity-optimized and performance-optimized CU can handle 50 requests per second when you retrieve the top 100 results. However, the capacity-optimized CU can store four times more vectors than the performance-optimized CU. Therefore, the capacity-optimized CU is the more suitable option for your needs.
Based on the test results, a single capacity-optimized CU can store up to 5.6 million 768-dimensional vectors. To accommodate your 100 million vectors, you will require a minimum of 18 CUs. When the
topk value is 100, a single CU can reach a peak QPS of 80 for throughput. For 50 QPS, one replica is enough. Therefore, you'll need a cluster with 18 capacity-optimized CUs.
Zilliz Cloud offers three types of CUs. If you need your application to be lightning-fast and responsive in real-time, the performance-optimized CU is the way to go. The capacity-optimized CU is the best choice for applications that require storing and retrieving tens of millions of vectors. If you're on a tight budget and okay with sacrificing speed and throughput, the cost-optimized CU is perfect for you.
Getting started with Zilliz Cloud
Start for free with the new Starter Plan!
Or start your 30-day free trial of the Standard plan with $100 worth of credits upon registration and the opportunity to earn up to $200 worth of credits in total.
Dive deeper into the Zilliz Cloud documentation.
Check out the guide on migrating from Milvus to Zilliz Cloud.