An IVF-PQ index combines an Inverted File (IVF) structure with Product Quantization (PQ) to optimize storage and search efficiency, but it introduces trade-offs in accuracy compared to a plain IVF index. The key difference lies in how data is stored and retrieved: IVF partitions the dataset into clusters and stores raw vectors (or residuals) alongside centroids, while IVF-PQ compresses the vectors using PQ to reduce storage but approximates distances during searches. This impacts both storage footprint and accuracy.
In terms of storage, IVF-PQ significantly reduces memory usage. A plain IVF index stores full-precision vectors (e.g., 32-bit floats) or residuals (vectors adjusted by their cluster centroid). For example, a 128-dimensional vector requires 512 bytes (128 × 4 bytes). With PQ, vectors are split into subvectors, each quantized into a small codebook entry. If each subvector is represented by an 8-bit code (1 byte), a 128D vector divided into 8 subvectors (16D each) would use 8 bytes total—a 64x reduction. This compression allows IVF-PQ to handle billion-scale datasets in memory, whereas plain IVF would require terabytes of storage for the same data.
However, this compression comes at the cost of accuracy. PQ introduces quantization errors because distances are computed using approximate codes rather than raw vectors. For example, a nearest-neighbor search in IVF-PQ might miss exact matches if the quantization clusters poorly represent the data distribution. In contrast, plain IVF computes distances using raw vectors (or residuals), preserving accuracy at the expense of higher storage. To mitigate this, IVF-PQ often uses larger codebooks or more subvectors, but this increases storage, highlighting the trade-off: higher compression (smaller storage) leads to lower accuracy, while larger codebooks improve accuracy but reduce storage savings.
In practice, IVF-PQ is preferred for large-scale applications where memory constraints are critical, such as recommendation systems with millions of items. Plain IVF is used when accuracy is prioritized, like in medical imaging or scenarios where hardware can accommodate larger datasets. Developers must balance these factors based on their specific needs for precision and resource limitations.
