How to Select the Most Appropriate CU Type and Size for Your Business?
A Compute Unit (CU) in Zilliz Cloud refers to hardware resources catering to search requests and indexes. Zilliz Cloud provides three types of CUs: Performance-optimized, Capacity-optimized, and Cost-optimized (coming soon). Each CU type comprises different CPU, memory, and storage resource combinations for different business needs. Therefore, selecting the appropriate CU options and sizes is crucial while configuring a Zilliz Cloud cluster.
Performance-optimized CU
A performance-optimized CU is ideal for similarity retrieval tasks that require a rapid response time in milliseconds with a high throughput of at least 100 queries per second (QPS). Each CU can handle about 1.5 million 768-dim vectors.
This CU type is essential for (but not limited to) the following use cases:
- Generative AI applications
- Recommender systems
- Search engines
- Chatbots
- Content moderation
- Augmenting LLMs' knowledge base
- Anti-fraud systems
Capacity-optimized CU
If your application handles tens of millions of vectors, consider using a capacity-optimized CU. Each CU can handle about 5 million 768-dim vectors. This type of CU can store much more data than a performance-optimized CU with a lower cost but also a lower performance.
Capacity-optimized CUs are particularly useful for (but not limited to) the following scenarios:
- Searching through large-scale unstructured data, such as text, images, videos, and molecular structures
- Detecting copyright violations
- Verifying identities.
Cost-optimized CU (coming soon)
The cost-optimized CU is ideal if you are not concerned with response time and have a very tight budget. Each CU can handle 20 million 768-dimensional vectors with a relatively reasonable price. While it has a higher search latency, it can hold up to 4 times more data as a capacity-optimized CU.
This type of CU is perfect for offline tasks such as:
- Data labeling or clustering
- Deduplication
- Dataset outlier detection or class balancing.
Evaluating three CU types
The table below provides an overview of the differences among Zilliz Cloud’s CU types.
CU Type | Latency | Throughput | Capacity | Cost per Million Vectors (note: based on 768-dim vectors) |
---|---|---|---|---|
Performance-optimized | Low | High | Low | Starting from $65/month |
Capacity-optimized | Medium | Medium | Medium | Starting from $20/month |
Cost-optimized | High | Low | High | Coming soon |
Performance comparison
To measure the performance of different CU options, we looked at two key indicators: search latency and throughput. We tested Zilliz Cloud's three types of CUs using two datasets with various topk
values (10, 100, 250, 1000). The first dataset consists of 1,000,000 vectors with 768 dimensions, and the second has 5,000,000 vectors with the same dimension.
top_k | / | / | 10 | 100 | 250 | 1000 |
---|---|---|---|---|---|---|
Latency | Performance-optimized CU | 1M 768dim | <10ms | <10ms | <10ms | 10-20ms |
Capacity-optimized CU | 5M 768dim | <50ms | <50ms | <50ms | 50-100ms | |
Cost-optimized CU | Coming soon |
The table above shows that the performance-optimized CU is the best choice for low latency, outperforming the capacity-optimized CU. It maintains a latency of under ten milliseconds for typical topk
values of 10-250, five to ten times faster than the capacity-optimized. When dealing with topk
values in the thousands, the latency for each CU type varies from 10-20 ms for the performance-optimized CU and 50-100 ms for the capacity-optimized CU. However, it's worth noting that even though the performance-optimized CU slows down in responses when conducting tasks withtopk
values in the thousands, its search latency is still suitable for many real-time applications.
top_k | 10 | 100 | 250 | 1000 | ||
---|---|---|---|---|---|---|
QPS | Performance-optimized cu | 1M 768dim | 520 | 440 | 270 | 150 |
Capacity-optimized CU | 5M 768dim | 100 | 80 | 60 | 40 | |
Cost-optimized CU | Coming soon |
When it comes to throughput, the performance-optimized CU is superior. It outperforms the capacity-optimized CU by four to five times.
Capacity comparison
We tested Zilliz Cloud's three types of CUs using a standard set of vector dimensions: 128, 256, 512, 768, and 1024.
Vector dimensions | Number of vectors per CU (Millions) | Number of vectors per CU (Millions) | Number of vectors per CU (Millions) |
---|---|---|---|
/ | Performance-optimized CU | Capacity-optimized CU | Cost-optimized CU |
128 | 5 | 25 | Coming soon |
256 | 2.96 | 14.87 | Coming soon |
512 | 1.63 | 8.22 | Coming soon |
768 | 1.5 | 5 | 20 |
1024 | 0.86 | 4.34 | Coming soon |
Based on the testing result in the table above, we find that:
- The cost-optimized CUs have the largest capacities in storing 768-dimensional vectors, 13 times and 4 times larger than the performance-optimized and capacity-optimized CUs respectively.
- As the vector dimensions increase, more storage space is needed to hold the data. For instance, a CU can store roughly twice the number of 512-dimensional vectors compared to 1024-dimensional vectors.
Note: This experiment only focused on the primary key and vectors without adding scalar fields. However, if there are additional scalar fields like id, label, keywords, summary, URL, etc., the actual capacity of each CU type may differ from the table above. Therefore, it's essential to rely on empirical measurement for accuracy.
Let’s look at some examples!
We have compared Zilliz Cloud's three CU options by looking through the lens of latency, throughput, capacity, and cost. But how do you choose the most suitable option for your business? Let's look at two examples to help you make the right choice.
Example 1
Suppose you’re building an LLM-augmented chatbot that adopts Zilliz Cloud to store more than 10 million text chunks of private documents with a 768-dimension embedding vector. Your application requires Zilliz Cloud to support 1,000 QPS and retrieve the top 10 results with an end-to-end latency of less than 30 milliseconds.
A performance-optimized CU is the only way to achieve a latency under 30ms. Since each performance-optimized CU can hold up to 1.5 million 768-dimensional vectors, you'll need at least seven CUs to handle all 10 million vectors. One CU can reach a peak QPS of 520 for throughput when the topk
value is 10. To take 1,000 QPS, you'll need two replicas.
Therefore, the best approach for this scenario is to use two replicas of a performance-optimized CU, each containing seven CUs.
Example 2
Suppose your application detects copyright violations in images and needs to find similar ones from a pool of 100 million. Each image is embedded into a 768-dimensional vector. You don’t require real-time responses but expect the top 100 results with a throughput of 50 QPS.
Both capacity-optimized and performance-optimized CU can handle 50 requests per second when you retrieve the top 100 results. However, the capacity-optimized CU can store three times more vectors than the performance-optimized CU. Therefore, the capacity-optimized CU is the more suitable option for your needs.
Based on the test results, a single capacity-optimized CU can store up to 5.6 million 768-dimensional vectors. To accommodate your 100 million vectors, you will require a minimum of 20 CUs. When the topk
value is 100, a single CU can reach a peak QPS of 80 for throughput. For 50 QPS, one replica is enough. Therefore, you'll need a cluster with 20 capacity-optimized CUs.
Summary
Zilliz Cloud offers three types of CUs. If you need your application to be lightning-fast and responsive in real-time, the performance-optimized CU is the way to go. The capacity-optimized CU is the best choice for applications that require storing and retrieving tens of millions of vectors. If you're on a tight budget and okay with sacrificing speed and throughput, the cost-optimized CU is perfect for you.
Getting started with Zilliz Cloud
Explore with our free tier (no credit card required), or try our 30-day enterprise trial with up to $200 in credits. Subscribe through any cloud marketplace and receive an additional $100 credit.
Dive deeper into the Zilliz Cloud documentation.
- Performance-optimized CU
- Capacity-optimized CU
- Cost-optimized CU (coming soon)
- Evaluating three CU types
- Let’s look at some examples!
- Summary
- Getting started with Zilliz Cloud
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free