Demystifying the Milvus Sizing Tool
In today's rapidly evolving data landscape, selecting the optimal configuration for your Milvus deployment is crucial for ensuring efficient performance and resource utilization. With the many options available, choosing the right configuration can feel overwhelming.
Here are 3 crucial points to consider when using the Milvus sizing tool.
milvus sizing tool
Index Selection: Balancing Memory, Disk, Cost, Accuracy, and Speed
Milvus offers various index algorithms (HNSW, FLAT, IVF_FLAT, IVF_SQ8) with trade-offs in memory usage, disk space, cost, speed, and accuracy. HNSW is usually the recommended choice since it balances performance and memory. See this blog for more details about these indexes.
HNSW:
Combines two concepts: skip lists and Navigable Small Worlds (NSWs) graphs. HNSW creates a hierarchical list of NSWs. HNSW search starts at the top layer, moving down layer-by-layer to find the nearest neighbor in each layer. The top layer has the fewest and the bottom layer has the most nodes in the graph.
Very fast querying and excellent recall. Requires the most memory per vector, so will likely cost the most.
FLAT:
100% recall (exhaustive search).
Queries speed is incredibly slow, (
O(n)
for data sizen
), and the index is the same size as the vector data.
IVF_FLAT:
Divides the vector space into clusters, search is conducted only over
nlist
clusters, improving search speed compared to IVF_FLAT.Medium-high recall, and medium query speed (slower than HNSW but faster than FLAT).
Requires more memory than HNSW, but less memory than FLAT.
IVF_SQ8:
Utilizes scalar quantization to reduce disk, compute, and memory consumption by 70-75%.
Medium recall, medium-high query speed.
Offers a better option than IVF_FLAT when resources are limited, at the cost of lower accuracy.
Besides the most common floating point indexes listed above, Milvus also supports ScANN (20% faster on CPU than HNSW), Binary-, Sparse-, and disk-based indexes, see the Milvus index doc pages.
DISKANN is a hybrid disk/memory index, and is a good option if you're okay with slightly longer latency (~100ms or so) but need to support a lot of vectors with high recall. AUTOINDEX just defaults to HNSW in open source Milvus (or higher-performing proprietary indexes in Zilliz).
GPU_CAGRA is the fastest of the GPU indexes, but it requires an inference card with GDDR memory rather than the one with HBM. Other GPU indexes supported are: GPU_BRUTE_FORCE, GPU_IVF_FLAT, GPU_IVF_PQ.
Segment Size and Deployment Configuration
The sizing tool offers three segment sizes (512 MB, 1024 MB, 2048 MB). The default segment size is 512 MB. Fewer, larger segments typically means faster search, so if you have large data, 2GB is usually recommended.
Think of segments as chunks of data; they're the smallest units in Milvus used for load balancing and enabling distributed search on indexes. Our quick rule of thumb:
For query node sizes of 4GB-8GB, use 512MB segments.
For query nodes <16GB, use 1GB segments.
For query nodes >16GB, opt for 2-4GB segment sizes.
Between Pulsar or Kafka, for new projects (green field installations), Pulsar is the recommended way to go since there's less overhead per topic.
Additional cost and speed configurations are available in the Enterprise version of Zilliz Cloud. For more information, see our cloud sizing tool:
Out of Memory (OOM) reduction and compaction optimization for peak performance.
Lazy Load Storage Savings:
Store hot data efficiently with standard compute units (CUs).
Tiered storage CUs for cost-effective storage of rarely accessed (cold) data.
Conclusion
Remember, This is just a starting point! Milvus offers extensive customization options.
The Milvus sizing tool focuses on a single index. If you need different index algorithms for various collections, create separate collections with custom configurations. This might require a more complex deployment setup.
Reference Links
Resource planning: https://docs.zilliz.com/docs/resource-planning
Zilliz cloud pricing calculator: https://zilliz.com/pricing#estimate_your_cost
Intro Milvus indexes: https://thesequence.substack.com/p/guest-post-choosing-the-right-vector
Docs Milvus Indexes: https://milvus.io/docs/index.md
Milvus GPU CAGRA index: https://zilliz.com/blog/Milvus-introduces-GPU-index-CAGRA
- Index Selection: Balancing Memory, Disk, Cost, Accuracy, and Speed
- Segment Size and Deployment Configuration
- Conclusion
- Reference Links
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free