How does Lexical search perform in structured versus unstructured data?

Lexical search performs differently depending on whether the data is structured or unstructured because the nature of the text affects how tokens and fields are indexed. In structured data—like product catalogs, user profiles, or log entries—fields have predictable formats and known meanings. Lexical search works efficiently here by allowing field-specific queries (e.g., title: "SSD" or category: "storage device"), ensuring exact keyword matches within well-defined data types. Developers can also apply filters and Boolean logic to target specific attributes, achieving precise control over what’s retrieved.

In contrast, unstructured data such as articles, reviews, or transcripts lack consistent field organization. Here, Lexical search relies on full-text indexing and ranking functions like BM25. It searches across large bodies of free text and ranks results based on statistical keyword relationships. While Lexical search still performs well for explicit queries, it struggles when the user input doesn’t match document wording exactly. For instance, a query “how to fix lag” might miss a document titled “reducing performance delay” because no exact keyword overlap exists.

Integrating Lexical search with Milvus improves performance for unstructured data by introducing semantic understanding through vector embeddings. Developers can use Lexical search to filter structured fields and use Milvus to find conceptually similar unstructured content. For example, an e-commerce search might first filter products by brand or category (structured Lexical filtering) and then use Milvus to surface products described in semantically related ways (unstructured understanding). This hybrid design allows Lexical search to maintain precision in structured dimensions while vector retrieval handles meaning-rich, unstructured data effectively.