Mean Reciprocal Rank (MRR) is a metric used to evaluate the effectiveness of retrieval systems by measuring how well they rank the first relevant document for a set of queries. MRR calculates the average of the reciprocal ranks of the first relevant result across all queries. The reciprocal rank for a single query is defined as 1 divided by the position of the first relevant document in the ranked results. If no relevant document is retrieved, the reciprocal rank is 0. For example, if the first relevant document appears in position 3 for a query, the reciprocal rank is 1/3. MRR averages these values across all queries, prioritizing systems that place relevant documents higher in the results.
In the context of a Retrieval-Augmented Generation (RAG) system, MRR helps assess the retriever’s ability to surface at least one relevant document early in the results, which is critical for downstream tasks like answer generation. For instance, if a user asks, "What causes climate change?" and the retriever returns a relevant document in position 2, the reciprocal rank is 0.5. If another query returns a relevant document in position 1, the reciprocal rank is 1. The MRR for these two queries would be (0.5 + 1)/2 = 0.75. This score reflects the retriever’s average performance in prioritizing relevant documents. MRR is particularly useful because RAG systems often rely on the top-ranked documents to generate accurate answers—poor retrieval at this stage can lead to incorrect or incomplete outputs.
However, MRR has limitations. It focuses only on the first relevant document, ignoring additional relevant results. For example, if a retriever returns three relevant documents with the first at position 3, MRR treats this the same as a scenario where only one relevant document exists at position 3. This makes MRR less suitable for tasks requiring multiple relevant documents but ideal for scenarios where a single high-quality result is sufficient. To apply MRR effectively, developers should use a benchmark dataset with predefined relevance judgments for queries. By tracking MRR during retriever optimization (e.g., tweaking embedding models or indexing strategies), teams can iteratively improve the system’s ability to surface critical information early, directly impacting the quality of the RAG pipeline’s final output.