Lexical search manages synonyms and stemming through preprocessing and query-time analysis techniques that normalize words and map related terms to shared forms. Stemming involves reducing words to their root form—for example, “running,” “runs,” and “ran” all become “run.” This allows the search engine to treat different word forms as equivalent, improving recall without storing redundant variations. Developers can implement stemming or lemmatization using language-specific analyzers. Lemmatization is more advanced since it considers the word’s grammatical role, making it suitable for complex languages or mixed-text corpora.
Handling synonyms is more challenging because it requires understanding that different words can have equivalent meanings. Lexical search engines often use synonym dictionaries or thesaurus files to define these relationships manually. For example, developers might define that “car,” “automobile,” and “vehicle” should be treated as the same token group. When a query uses one of these terms, the search engine expands it internally to include all synonyms. This expansion improves coverage but must be tuned carefully, as excessive synonym mappings can produce noisy results or irrelevant matches.
When combined with a vector database like Milvus, Lexical search’s synonym and stemming features form the precision layer of a hybrid system. Lexical matching ensures token-level accuracy, while embeddings stored in Milvus capture implicit synonymy and conceptual similarity automatically. For instance, even if a synonym list omits “AI” and “artificial intelligence,” Milvus can recognize their relationship through vector proximity. This dual approach allows developers to balance rule-based control from Lexical search with learned semantic flexibility from vector embeddings, delivering both precision and contextual relevance in retrieval results.
