Vector quantization, such as Product Quantization (PQ), reduces storage requirements by compressing high-dimensional vectors into compact codes. Instead of storing full-precision vectors (e.g., 32-bit floats), PQ divides each vector into subvectors and replaces each subvector with an index pointing to the closest centroid in a precomputed codebook. For example, a 128-dimensional float32 vector (512 bytes) split into 8 subvectors (16 dimensions each) can be represented using 8 centroid indices. If each codebook has 256 centroids (8 bits per index), the entire vector is reduced to 8 bytes—a 64x compression. The codebooks themselves are small relative to the dataset size, making this approach scalable for billion-scale datasets. This compression is achieved by exploiting redundancy in the data distribution, allowing similar vectors to share common centroids.
The impact on search accuracy stems from approximation errors introduced during quantization. When subvectors are replaced with centroids, fine-grained details of the original vectors are lost. This can cause the distance calculations between quantized vectors to deviate from the true distances in the original space, potentially leading to mismatches in nearest-neighbor search results. For instance, a true nearest neighbor might be mapped to a different centroid cluster, reducing recall. However, PQ mitigates this by preserving local structure within subvectors and using efficient distance approximation methods. The error is often manageable in practice, especially when the codebook is trained on representative data and the number of centroids per subvector is sufficiently large (e.g., 256 centroids).
The trade-off between storage and accuracy is controlled by parameters like the number of subvectors and centroids. Increasing subvectors (e.g., 16 instead of 8) reduces error but requires more storage. Similarly, using 512 centroids per subvector (9 bits instead of 8) improves accuracy at the cost of larger codes. Techniques like IVF-PQ further balance this by combining quantization with inverted indexing, narrowing the search to a subset of clusters. While quantized indexes rarely match the accuracy of raw vectors, they enable practical trade-offs—for example, a 1-5% drop in recall for a 50x storage reduction—making them indispensable for large-scale applications like recommendation systems or image retrieval.
