To save a fine-tuned Sentence Transformer model, use the save()
method provided by the library. After training, call model.save("output_path")
, where "output_path"
is a directory where the model and its components will be stored. This saves the entire model architecture, trained weights, tokenizer, and configuration files (e.g., config.json
, sentence_bert_config.json
, and pytorch_model.bin
). For example, if your model uses a specific pooling layer or custom modules, these are serialized to ensure consistency when reloaded. Always verify the output directory contains these files to confirm the save was successful.
To load the model for inference, initialize a SentenceTransformer
object with the saved directory path: model = SentenceTransformer("output_path")
. This reconstructs the model using the saved configuration and weights. The process mirrors loading pretrained models from the Hugging Face Hub, but uses the local path instead. If you added custom layers during fine-tuning (e.g., a classifier head), ensure those components are defined in your code before loading or are already part of the saved model’s architecture to avoid errors.
For deployment, consider compatibility and efficiency. Ensure the same versions of sentence-transformers
, PyTorch, and dependencies are used in the target environment. If disk space or latency matters, test converting the model to ONNX format using torch.onnx
or apply quantization techniques. In serverless environments, package the model directory with your inference code. Avoid altering the saved files manually, as changes to configurations or weights may break the model. Always validate the loaded model with a test inference to confirm it behaves as expected post-load.