The memory overhead introduced by HNSW or IVF indexes depends on the algorithm and configuration. HNSW typically requires 1.5–2x the memory of the raw vector data, while IVF introduces minimal overhead unless combined with compression techniques like product quantization (PQ). Both can be tuned via parameters to balance memory usage with search performance and accuracy.
HNSW Memory Overhead
HNSW stores a hierarchical graph where each vector (node) maintains connections to neighboring nodes. For a 768-dimensional float32 vector (3,072 bytes), the overhead comes from the graph links. If each node has 40 connections (a typical M
parameter), and each connection is a 4-byte integer, this adds 160 bytes per vector. Including the vector itself, this totals ~3,232 bytes per vector. For 1 million vectors, this results in ~3.2 GB for raw data and ~4.8 GB with HNSW. The exact overhead scales with parameters like M
(number of connections) and the number of graph layers. Reducing M
or using fewer layers lowers memory but may impact search accuracy.
IVF Memory Overhead IVF groups vectors into clusters using centroids. For 1,024 clusters and 1 million vectors, storing centroids adds ~3 MB (1,024 centroids × 3,072 bytes each). Assigning vectors to clusters requires ~4 MB (1 million 4-byte cluster IDs). Without compression, IVF adds negligible overhead (~7 MB). However, IVF is often paired with PQ for compression. For example, using 8-byte PQ codes reduces a 768-dimensional float32 vector to 8 bytes, cutting memory from 3,072 bytes to 8 bytes per vector. This makes IVF+PQ highly memory-efficient (~8 MB for 1 million vectors) but introduces approximation errors.
Managing Overhead
- Parameter Tuning: For HNSW, reduce
M
(connections per node) or use fewer layers. For IVF, adjustnlist
(number of clusters) and PQ byte size. - Precision Reduction: Store vectors as float16 (halving memory) or use binary quantization for non-floating-point data.
- Compression: Apply PQ with IVF or leverage scalar quantization (e.g., FAISS’s SQ4) to compress vectors.
- Trade-offs: Accept lower recall for smaller memory footprints. For example, HNSW with
M=16
uses less memory thanM=32
but may miss some neighbors. - Hybrid Approaches: Combine on-disk storage for less frequently accessed data with in-memory caching for active subsets.
For instance, in FAISS, using IVF4096 with PQ8 reduces memory usage by ~99% compared to raw float32 vectors, while HNSW with M=16
might use 1.3x the raw data size. Developers should benchmark configurations to align with their memory constraints and accuracy requirements.