all-mpnet-base-v2 does not require a GPU; it can run on CPU and is often deployed that way for moderate throughput. The minimum hardware requirement is basically “a machine that can run PyTorch/ONNX and hold the model in memory,” which is modest by modern server standards. That said, the practical hardware you need depends on your workload: offline embedding of a large corpus, online query embedding at high QPS, or both. For small to medium systems (tens of thousands to a few million chunks, low to moderate query volume), CPU inference with batching can be sufficient. For high-QPS systems or very large batch jobs, GPUs can significantly reduce embedding time and improve tail latency.
In practice, performance depends heavily on sequence length and batching. Because embedding models are encoder-only Transformers, cost scales with tokens. If you keep inputs short (sentences/short paragraphs), CPU can be quite workable. If you feed long chunks near the model’s max token length, throughput drops quickly. Developers typically control this by chunking documents into moderate lengths and embedding in batches. If you need to embed large corpora frequently (daily refresh, multiple languages, multiple versions), you may prefer GPU for offline jobs and CPU for online queries, or you may use optimized runtimes (ONNX Runtime) to improve CPU throughput. Memory planning also matters: the model itself must fit in RAM, and your embedding store plus index will likely dominate memory if you keep vectors resident.
For retrieval, the bigger hardware demand is often the vector index rather than the encoder. Storing 768-dimension vectors at scale and serving low-latency ANN search requires careful indexing and enough RAM/CPU for the database. A vector database such as Milvus or Zilliz Cloud helps you manage that by providing indexing options and operational scaling. In a typical architecture, you size hardware based on: (1) embedding throughput needs (CPU/GPU), (2) number of vectors and index type (memory/CPU), and (3) desired p95 latency. mpnet-base is production-friendly, but the “right hardware” is a capacity planning decision tied to your corpus size and traffic, not a fixed requirement.
For more information, click here: https://zilliz.com/ai-models/all-mpnet-base-v2
