N-grams are sequences of "n" consecutive words in a document or query, and they are commonly used in information retrieval (IR) to capture local word patterns and contextual information. For example, a bigram refers to two consecutive words, while a trigram refers to three consecutive words.
In IR, n-grams are useful for improving query matching by capturing multi-word expressions or phrases that might carry a specific meaning. For instance, in a search for "machine learning," the bigram "machine learning" can help the system match documents that specifically contain that phrase, rather than just the individual words "machine" and "learning" in isolation.
N-grams help enhance the retrieval process by allowing the system to better understand the context of search queries and documents. By considering multiple word sequences, n-grams can improve search accuracy, especially in cases where exact word order or phrase matching is important. They are widely used in tasks like text classification, clustering, and query expansion.