What is recall in vector search results? Recall in vector search measures how well an Approximate Nearest Neighbor (ANN) algorithm retrieves the true top results compared to an exact (ground-truth) search. When searching high-dimensional data, ANN algorithms trade precision for speed by approximating results, but this can miss some true matches. Recall quantifies the fraction of ground-truth neighbors successfully returned by the ANN method. For example, if an exact search identifies 100 neighbors for a query, and the ANN retrieves 80 of them, the recall is 80%.
How is recall calculated? Recall is computed by comparing the ANN’s output against a ground-truth dataset generated via brute-force exact search. For a given query, let the ground-truth set contain the top k nearest neighbors. If the ANN returns k results, recall is the ratio of overlapping items between the ANN’s results and the ground-truth set to k. Mathematically, recall = (number of shared items) / k. For instance, if the ground-truth has 10 neighbors and the ANN retrieves 7 of them, recall is 0.7. This is repeated across all queries, and the average is reported.
Practical considerations and examples Ground-truth sets are typically precomputed using exact methods, which are slow but accurate. ANN evaluation often uses fixed k (e.g., top 100) for consistency. For example, in image retrieval, if an ANN finds 90 out of 100 true matches for a query image, its recall is 90%. However, if the ANN returns more or fewer results than k, the calculation adjusts. If the ANN returns 200 results containing 95 of the ground-truth 100, recall is 95/100 = 0.95. High recall indicates the ANN closely approximates exact results, but it often comes at the cost of higher computational overhead.