voyage-2 is useful and straightforward, but developers should be aware of limitations that mostly come from embedding-based retrieval in general: input length constraints, loss of exact detail, and the fact that embeddings are not magic “truth engines.” First, you can’t embed arbitrarily long documents in one shot; you need to chunk content to stay within the model’s maximum input size. Second, embeddings compress text into a fixed-length vector, which means some details are inevitably lost—embeddings are great for “aboutness” (topic/meaning) but not guaranteed to preserve precise facts like exact numbers, version strings, or code-level specifics unless your chunking and retrieval are tuned to keep those details in the retrieved text. Third, semantic similarity can return plausible-but-wrong neighbors if your dataset has many near-duplicates or if your query is underspecified.
Another practical limitation is lifecycle and consistency management. If you change your chunking strategy (say, from 300-token chunks to 800-token chunks), your existing vectors are no longer directly comparable in terms of retrieval behavior, even though the dimension is the same. Likewise, if you switch model versions or embedding settings, you typically need to re-embed your entire corpus to keep the vector space consistent. That’s not a dealbreaker, but it’s a real operational consideration: plan for reindexing time, cost, and the need for a safe migration strategy (often done by writing to a new collection and swapping traffic). Also consider API and throughput constraints: if you’re embedding millions of chunks, you’ll need batching, concurrency control, retries, and backoff to avoid turning ingestion into a fragile pipeline.
Finally, don’t ignore the limitations at query time: latency budgets and retrieval quality tradeoffs live in both the embedding call and the vector search. This is where a vector database such as Milvus or Zilliz Cloud helps, but you still need to tune it. Approximate nearest-neighbor indexes trade perfect recall for speed; top-k results can change based on index parameters, filtering, and whether you normalize vectors or choose a particular metric. You should build a small relevance test set (queries + expected passages) and rerun it whenever you change chunking, index type, or embedding model. In short: voyage-2 limitations are manageable, but you’ll get the best results when you treat retrieval as an engineered system (chunking + metadata + indexing + evaluation), not as a single API call that “solves search.”
For more information, click here: https://zilliz.com/ai-models/voyage-2
