To tune IVF index parameters (nlist and nprobe) for a target recall at the fastest query speed, start by understanding their roles. nlist determines the number of clusters during indexing, while nprobe controls how many clusters are searched during queries. Higher nlist spreads data into more clusters, reducing the vectors per cluster, but requires careful tuning of nprobe to avoid missing relevant clusters. Lower nprobe speeds up queries but risks lower recall, while higher nprobe improves recall at the cost of slower queries. The goal is to balance these parameters to minimize query time while hitting the desired recall.
First, establish a baseline based on dataset size. A common heuristic sets nlist = 4 * sqrt(N)
, where N is the number of vectors. For example, with 1M vectors, start with nlist=4000
. Next, sweep nprobe values for this nlist. Measure recall and query time using a validation set (e.g., 1% of data). For instance, if nprobe=10
achieves 85% recall but your target is 90%, incrementally increase nprobe (e.g., 15, 20) until the target is met. If query time becomes unacceptable, adjust nlist upward (e.g., 6000) to reduce cluster size, allowing lower nprobe for the same recall. Repeat this process iteratively to find the optimal trade-off.
Second, analyze the recall-time curve for different nlist values. For example, with nlist=4000
, increasing nprobe from 10 to 20 might improve recall from 85% to 92% but add 5ms per query. If nlist=6000
allows nprobe=10
to achieve 90% recall at 3ms, the latter configuration is better. Use automated tools like grid search or Bayesian optimization to efficiently explore the parameter space. Libraries like FAISS also provide benchmarking utilities (e.g., index_io
and ParameterSpace
) to automate this process. Prioritize configurations where increasing nprobe yields diminishing recall gains, as this indicates the "knee" of the curve where query time is minimized for the target recall.
Finally, validate and monitor the chosen parameters. Test the configuration on a held-out dataset to ensure generalization. For example, if nlist=5000
and nprobe=12
achieve 91% recall at 4ms, verify this on a separate 10k-query test set. Consider runtime constraints: if the index size from a high nlist exceeds memory limits, you might need to reduce nlist and accept a slightly higher nprobe. Additionally, if data distribution changes over time (e.g., new vectors are added), periodically retune parameters to maintain performance. Tools like FAISS’s index_factory
can help rebuild the index with updated parameters efficiently.