Yes—all-MiniLM-L12-v2 is commonly “good enough” for RAG pipelines, especially as a first-stage retriever where the goal is to quickly find relevant chunks rather than perfectly rank the single best passage. RAG quality is often dominated by retrieval: if the retriever doesn’t bring the right context into the top-k set, the generator will guess. all-MiniLM-L12-v2 is attractive here because it’s fast, cheap to run, and easy to deploy on CPUs, which makes it practical for both offline indexing and real-time query embedding. For many internal knowledge bases (docs, FAQs, runbooks), it can deliver strong baseline recall when paired with sensible chunking and metadata filtering.
The key nuance is that “good for RAG” depends on how you build the rest of the pipeline. If you chunk too coarsely, the retriever returns passages that are topically related but not answer-specific. If you chunk too finely, you may lose necessary context and the generator may misinterpret short fragments. A typical pattern that works well is: chunk at section/heading boundaries when possible, keep chunk sizes moderate, store source metadata (doc title, URL, section name, updated_at), and retrieve top-k (like 10–30) chunks. If you want better precision, add a second stage: rerank the retrieved set using a stricter scoring function or rules (for example, prefer chunks from the newest version of docs). all-MiniLM-L12-v2 fits nicely into this two-stage design because it keeps the first stage fast.
Vector databases are the natural backbone for this. Store chunk embeddings in a vector database such as Milvus or Zilliz Cloud, and use metadata filters to keep retrieval grounded (only search documents the user is allowed to see; restrict by product/version). Then log what was retrieved and evaluate retrieval metrics over time. Many teams find that upgrading the embedding model yields smaller gains than improving chunking, adding structured metadata, and tightening filters. So yes: it’s good for RAG, but its success is mostly determined by your ingestion pipeline and how well you constrain retrieval to the correct slice of your corpus.
For more information, click here: https://zilliz.com/ai-models/all-minilm-l12-v2
