What is the simplest way to encode a list of sentences into embeddings using a pre-trained Sentence Transformer model?

The simplest way to encode sentences into embeddings with a pre-trained Sentence Transformer model involves three steps: installing the library, loading the model, and calling its encode() method. First, ensure the sentence-transformers package is installed. Next, import and initialize a model (e.g., all-MiniLM-L6-v2, a lightweight default). Finally, pass a list of sentences to the model's encode() method, which returns a NumPy array or PyTorch tensor containing the embeddings. This approach abstracts away tokenization, padding, and batching, making it straightforward for common use cases.

Here’s a concrete example using Python:

from sentence_transformers import SentenceTransformer

# Load the pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# List of input sentences
sentences = ["This is a sample sentence.", "Another example text."]

# Generate embeddings
embeddings = model.encode(sentences)

The encode() method handles batch processing automatically and returns a 2D array where each row corresponds to a sentence’s embedding. By default, embeddings are normalized to unit length, which is useful for cosine similarity calculations. If you need non-normalized vectors, set normalize_embeddings=False in the encode() call.

Key considerations include device selection (CPU/GPU) and output format. For GPU acceleration, add device="cuda" when loading the model. The output defaults to NumPy arrays, but you can return PyTorch tensors with convert_to_tensor=True. For large datasets, use show_progress_bar=True to track encoding progress. This method works for most cases, but for specialized needs (e.g., custom pooling or multilingual text), you may need to configure the model or preprocess text differently.

Your AI Reference Guide
What is the simplest way to encode a list of sentences into embeddings using a pre-trained Sentence Transformer model?

What is the simplest way to encode a list of sentences into embeddings using a pre-trained Sentence Transformer model?

Recommended AI Learn Series

VectorDB for GenAI Apps

Share this article

Keep Reading

AI Assistant

Your AI Reference GuideWhat is the simplest way to encode a list of sentences into embeddings using a pre-trained Sentence Transformer model?

What is the simplest way to encode a list of sentences into embeddings using a pre-trained Sentence Transformer model?

Recommended AI Learn Series

VectorDB for GenAI Apps

Share this article

Keep Reading

AI Assistant

Your AI Reference Guide
What is the simplest way to encode a list of sentences into embeddings using a pre-trained Sentence Transformer model?