In practical benchmark reports, recall and QPS are reported together to balance the trade-off between search accuracy and system throughput. Recall measures how many relevant results a database returns (e.g., 90% recall means 9 out of 10 expected results are found), while QPS quantifies how many queries the system can process per second. Reporting them together ensures developers understand whether a database achieves high speed at the cost of accuracy, or vice versa. For example, a benchmark might show that a system achieves 10,000 QPS with 70% recall, but only 500 QPS when tuned for 95% recall. This highlights the performance-accuracy trade-off inherent in approximate nearest neighbor (ANN) search algorithms.
Benchmarks often present these metrics across multiple configurations, such as varying index parameters (e.g., HNSW graph layers or IVF list sizes) or hardware setups. For instance, a report might include a table or graph showing recall-QPS pairs for different combinations of parameters. This allows developers to see how adjusting a parameter like nprobe
(the number of clusters searched in an IVF index) impacts both metrics. A higher nprobe
typically improves recall but reduces QPS, as the system spends more time searching more clusters. By plotting these values together, the benchmark provides a clear picture of the operational "sweet spot" for specific workloads, such as real-time applications (prioritizing QPS) versus batch analytics (prioritizing recall).
To ensure relevance, benchmarks also contextualize these metrics with dataset characteristics (e.g., dimensionality, dataset size) and hardware specs. For example, a benchmark might note that a database achieves 85% recall at 2,000 QPS on a 1M-dimensional dataset using a GPU-accelerated index, but only 1,200 QPS on CPU-only hardware. This helps developers map results to their own infrastructure and data requirements. Additionally, some reports include latency percentiles (e.g., P99 latency) alongside QPS to highlight consistency under load. By combining recall, QPS, and operational context, benchmarks provide actionable insights for tuning and deployment decisions.