Word embeddings and sentence/document embeddings are both techniques to represent text as numerical vectors, but they operate at different levels of granularity and serve distinct purposes. Word embeddings focus on individual words, mapping each to a dense vector that captures semantic meaning based on context. For example, in a word embedding model like Word2Vec or GloVe, the vector for "king" might be mathematically closer to "queen" than to "apple" because of their contextual relationships. Sentence/document embeddings, on the other hand, represent entire sentences, paragraphs, or documents as single vectors. These aim to capture the overall meaning or theme of the text, such as whether a sentence expresses positivity or describes a technical process. The key difference lies in scope: word embeddings handle atomic units (words), while sentence/document embeddings aggregate information to represent larger textual structures.
Word embeddings are typically trained using methods that analyze word co-occurrence patterns. For instance, Word2Vec uses a neural network to predict surrounding words (Skip-Gram) or a target word from its context (CBOW), creating vectors that reflect syntactic and semantic relationships. A classic example is that the vector for "king" minus "man" plus "woman" results in a vector close to "queen." These embeddings excel at tasks requiring word-level understanding, like part-of-speech tagging or named entity recognition. However, they struggle with longer texts because they don’t inherently model word order or global context. If you average the word embeddings of a sentence, you might lose nuances like negation (e.g., "not good" vs. "good"). This limitation led to the development of sentence and document embeddings.
Sentence/document embeddings address these shortcomings by considering the entire text’s structure. Models like BERT generate sentence embeddings by processing all words in parallel and outputting a single vector that accounts for word interactions. For example, the sentence "The quick brown fox jumps" would have a vector reflecting the action (jumping) and the subject (fox). Document-level models like Doc2Vec or Universal Sentence Encoder extend this further, often using techniques like hierarchical pooling or attention mechanisms to handle longer texts. These embeddings are useful for tasks like sentiment analysis, document clustering, or search engines where understanding the full context matters. For instance, a support ticket stating "My app crashes on startup" could be mapped to a vector closer to "Application fails during launch" than to "Payment processing error," enabling accurate categorization. While word embeddings are foundational, sentence/document embeddings provide higher-level abstractions tailored to broader use cases.