Two runs of the same Sentence Transformer model can produce slightly different embeddings due to inherent randomness in certain operations, even during inference. This variability is not unique to Sentence Transformers but is common in neural networks. The primary sources of randomness include model architecture choices (like dropout layers), hardware-level computation differences (especially on GPUs), and framework-level nondeterministic operations. For example, matrix multiplications on GPUs may use parallelized algorithms that introduce tiny numerical variations due to floating-point precision limits.
To control this randomness, you can enforce deterministic behavior. First, set random seeds for libraries like PyTorch (which Sentence Transformers is built on) and NumPy using torch.manual_seed()
, numpy.random.seed()
, and Python’s random.seed()
. Second, configure PyTorch to use deterministic algorithms with torch.backends.cudnn.deterministic = True
and torch.backends.cudnn.benchmark = False
. Third, ensure dropout layers are disabled by putting the model in evaluation mode (model.eval()
). However, even with these steps, full determinism isn’t guaranteed on GPUs due to hardware-level optimizations. Testing on a CPU (using model.to('cpu')
) may yield more consistent results but at the cost of speed.
Practical example: If you initialize the model and set all seeds and configurations, embeddings should match across runs. For instance:
import torch
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
model.eval() # Disables dropout
# Set seeds and deterministic settings
torch.manual_seed(42)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
embedding1 = model.encode("test sentence")
embedding2 = model.encode("test sentence")
print(torch.allclose(embedding1, embedding2)) # Should output True if deterministic
Note that discrepancies might still occur across different hardware or library versions, so environment consistency is key.