Normalized Discounted Cumulative Gain (nDCG) is a measure used to evaluate the effectiveness of ranking systems, especially in information retrieval and search engines. It assesses the quality of a ranked list of documents based on their relevance to a specific query. The nDCG score ranges from 0 to 1, where 1 indicates a perfect ranking based on relevance. The calculation involves two primary steps: computing Discounted Cumulative Gain (DCG) and normalizing it against Ideal DCG (IDCG).
To calculate DCG for a ranked list, you first assign a relevance score for each document in the result set. These scores can range from 0 (not relevant) to some positive integer (highly relevant). The formula for DCG at position ( p ) is given by:
[ DCG_p = \sum_{i=1}^{p} \frac{rel_i}{\log_2(i + 1)} ]
Here, ( rel_i ) is the relevance score of the document at position ( i ). The logarithmic factor serves to reduce the contribution of relevance scores of documents that appear lower in the list. For example, if you have a ranking where the relevance scores are [3, 2, 3, 0, 1] for the top 5 documents, the DCG would be computed using their respective ranks.
After calculating the DCG, you need to normalize it to make comparisons meaningful across different queries. This is done by calculating the Ideal DCG (IDCG) for each query, which is the DCG of the best possible ranking of documents, sorted by their relevance scores. The normalization is straightforward:
[ nDCG_p = \frac{DCG_p}{IDCG_p} ]
If we take our earlier example and assume the ideal ranking is [3, 3, 2, 1, 0], the IDCG would be computed similarly and can be used to derive the nDCG. Normalizing ensures that the score reflects the quality of the ranking relative to the best possible outcome, allowing for fair comparisons between different systems or queries.