Inverse document frequency (IDF) is a measure used in information retrieval (IR) to evaluate the importance of a term within a corpus of documents. IDF calculates how much a term is "rare" across all documents. The more documents a term appears in, the lower its IDF value. The idea is that terms that appear in many documents are less informative or distinctive than terms that appear in only a few documents.
Mathematically, IDF is computed as the logarithm of the total number of documents divided by the number of documents containing the term. If a term appears in every document, its IDF is low, indicating that it is not unique. Conversely, a term that appears in fewer documents will have a higher IDF, making it more significant.
IDF is commonly used in the TF-IDF (term frequency-inverse document frequency) metric, where it helps adjust the importance of each term in a document based on its frequency and rarity, improving the effectiveness of search rankings by emphasizing terms that are unique and relevant.
