Often yes in retrieval quality, but not always in total system performance. all-mpnet-base-v2 generally produces stronger embeddings than lightweight MiniLM models for English semantic similarity, which can translate into better recall for subtle queries and fewer “near miss” results. The tradeoff is cost: mpnet-base models are heavier, usually slower, and can increase compute and memory requirements. If you run at high QPS, or if you embed a very large corpus frequently, the added cost can outweigh the quality gain. So “better” should be defined as “meets my quality target under my latency and budget constraints,” not “wins on a leaderboard.”
The difference tends to show up most in edge cases: many similar documents, subtle distinctions between intents, and corpora with lots of paraphrasing. If your content is clean, your chunking is good, and your metadata filters are strong, MiniLM can be “good enough” and much cheaper. For example, if you always filter by product and version, you reduce the search space and the model has an easier job, narrowing the quality gap. Conversely, if you have a messy corpus and you rely heavily on embeddings alone, mpnet-base can provide a noticeable uplift.
A practical approach is to A/B test both in the same retrieval stack. Store embeddings from each model in separate collections in a vector database such as Milvus or Zilliz Cloud, run the same evaluation queries, and compare recall@k, nDCG@k, and latency/cost. Many teams end up using MiniLM for fast first-stage retrieval and mpnet-base when they need higher accuracy for a smaller candidate set or for premium queries. The correct decision is the one you can defend with metrics and operational constraints, not a generic “better” label.
For more information, click here: https://zilliz.com/ai-models/all-mpnet-base-v2
