Sentence Transformer models typically produce embeddings with dimensionality ranging from 384 to 1024 dimensions, depending on the specific architecture and configuration. Most commonly used pretrained models, such as all-MiniLM-L6-v2
, generate 384-dimensional vectors, while larger models like all-mpnet-base-v2
output 768 dimensions. The choice of dimensionality is a trade-off between computational efficiency, memory usage, and the model's ability to capture semantic nuances.
The dimensionality is determined by the transformer model's hidden size. For example, BERT-base models have a hidden size of 768, so their embeddings (without pooling or other post-processing) would naturally align with that dimension. Sentence Transformers often apply pooling or dense layers to reduce or standardize dimensions. For instance, all-MiniLM-L12-v2
compresses embeddings to 384 dimensions using a projection layer after the transformer output, balancing performance and resource usage. This reduction helps optimize inference speed and storage requirements, which is critical for applications like vector databases or real-time semantic search.
Developers should consider their use case when selecting a model. Lower-dimensional embeddings (e.g., 384) work well for tasks like clustering or retrieval where speed and memory are priorities, while higher dimensions (e.g., 768) might be better for fine-grained semantic tasks like paraphrase detection. For example, the stsb-roberta-large
model uses 1024 dimensions for high-precision similarity scoring. Most pretrained models document their output dimensions, and the Sentence Transformers library allows easy inspection via model.get_sentence_embedding_dimension()
, enabling informed choices based on project constraints.