A sparse vector in information retrieval (IR) is a vector where most of the elements are zero or null. Sparse vectors are commonly used to represent text data, where only a small subset of terms (features) are present in any given document. In traditional IR models, sparse vectors are typically generated using techniques like term frequency (TF) or TF-IDF, where each dimension corresponds to a specific term in the vocabulary.
For example, in a document-term matrix, most of the values will be zero because each document only contains a small number of unique words from the entire vocabulary. Sparse vectors are efficient in storage and computation, as they only store non-zero values and their indices.
While sparse vectors are effective in traditional keyword-based IR systems, they may not capture semantic relationships as well as dense vectors. However, they are still widely used for tasks like keyword search and document classification, where explicit term matching is important.