Yes—this is one of the most common ways all-MiniLM-L12-v2 is used in production. The model outputs a fixed-length embedding vector (commonly 384 dimensions), and vector databases are designed to store exactly that kind of data and search it efficiently with approximate nearest neighbor (ANN) indexes. The basic pattern is simple: embed each document chunk with all-MiniLM-L12-v2, store the vector plus metadata, then embed a user query at runtime and retrieve the nearest vectors. If you can generate embeddings, you can use a vector database; there’s nothing “special” you need beyond consistent embedding dimension and a stable similarity metric choice (often cosine similarity with normalized vectors).
Where vector databases matter is scalability and control. If you try to do brute-force vector search in memory, performance falls off quickly as your corpus grows. A vector database such as Milvus or Zilliz Cloud gives you ANN indexing, partitioning, sharding, and filtering so you can keep latency low as you scale from thousands to millions (or more) of chunks. Metadata filters are particularly important with this model because they help compensate for limitations like multilingual mixing or domain drift. For example, you can store lang, product, doc_type, and version fields and then search only within the relevant subset. This makes retrieval feel more precise without requiring a heavier embedding model.
In practice, the “working well” part comes down to consistency. Use the same preprocessing and tokenization rules for documents and queries. Normalize embeddings if you’re using cosine similarity. Decide on chunking and stick to it, because the vector space reflects the chunk content. Then tune index parameters to your latency/recall goals. With Milvus, you typically choose an index type and configure build/search parameters to trade memory and speed for recall. With Zilliz Cloud, you get a managed experience but you still control schema, indexing, and query patterns. The model and the database complement each other cleanly: all-MiniLM-L12-v2 defines the semantic space, and the vector database makes that space searchable at scale.
For more information, click here: https://zilliz.com/ai-models/all-minilm-l12-v2
