voyage-2 supports general natural-language text inputs—things like sentences, paragraphs, documentation, support tickets, product descriptions, and other typical “plain text” that developers index for semantic search or retrieval. The model’s job is to produce embeddings that capture meaning, so it’s most useful when your input is text where semantics matter more than exact keywords. Practically, that means you can feed it most UTF-8 text strings you’d normally handle in your application, including punctuation, code-like fragments inside prose, and mixed formatting (as long as you send it as a string).
In implementation terms, you usually want to normalize the “shape” of your text before embedding. For example: strip boilerplate headers, remove repeated navigation text, and chunk large documents into coherent pieces (headings + a few paragraphs). If you embed entire documents that are too long, you risk blurring multiple topics into one vector; chunking improves retrieval accuracy and makes results easier to show to users (“here’s the exact paragraph that answers your question”). You can embed text one-by-one, but batching (sending a list of strings in a single request) is commonly more efficient and easier to manage. You’ll also want to store the original chunk text alongside the vector, because embeddings alone can’t be displayed meaningfully to users.
Where text “type” matters most is downstream: how you search and filter. If you’re embedding FAQs, you might store fields like category, product, and language. If you’re embedding internal docs, you might store repo, path, section_title, and updated_at. Then at query time, you can do similarity search plus metadata filtering in a vector database such as Milvus or Zilliz Cloud. That combination is what makes embedding models actually useful in production: embeddings give you semantic matching, while the database lets you constrain results to “only docs from repo X” or “only results in Japanese,” and still return top-k nearest neighbors quickly.
For more information, click here: https://zilliz.com/ai-models/voyage-2
