Yes, you can use Sentence Transformer models without the Sentence Transformers library by leveraging the Hugging Face Transformers API directly. Sentence Transformer models are essentially pre-trained transformer models (like BERT or RoBERTa) fine-tuned with a pooling layer to generate sentence embeddings. The Sentence Transformers library simplifies this process by abstracting the pooling and normalization steps, but you can replicate these steps manually using the Transformers library.
To achieve this, you need to load the model and tokenizer using Hugging Face’s AutoModel
and AutoTokenizer
classes. After tokenizing the input text, pass it through the model to get token-level embeddings. Next, apply mean pooling (or another pooling method) across the token embeddings to create a fixed-length sentence embedding. Finally, normalize the resulting vector to unit length, as many downstream tasks (like cosine similarity) assume normalized embeddings. For example, using a model like sentence-transformers/all-MiniLM-L6-v2
, you’d load it with AutoModel.from_pretrained
, compute mean pooling over the last_hidden_state
output, and apply L2 normalization.
However, this approach requires careful implementation. The pooling logic must match how the original Sentence Transformer model was trained. For instance, some models use CLS tokens or max pooling instead of mean pooling. You’d need to check the model’s documentation or inspect its code in the Sentence Transformers library to ensure parity. Additionally, performance might differ slightly due to implementation details (e.g., handling padding tokens during pooling). Testing with example inputs and comparing outputs between the two approaches is recommended to validate correctness. While this method works, using the Sentence Transformers library remains simpler for most use cases, as it handles these nuances automatically.