To deploy or use a custom fine-tuned model from AWS Bedrock for inference after fine-tuning, follow these steps. First, confirm the fine-tuning job has completed successfully via the Bedrock console, AWS CLI, or SDK. Once complete, the custom model is automatically available in your account under Bedrock’s Custom models section. You don’t need to deploy it to an endpoint separately—Bedrock manages this infrastructure, so the model is ready for inference immediately.
To invoke the model, use the Bedrock InvokeModel
or InvokeModelWithResponseStream
API. Specify the model ID of your fine-tuned model in the request. For example, in Python using Boto3, you’d structure a request like this:
import boto3
bedrock = boto3.client(service_name='bedrock-runtime')
response = bedrock.invoke_model(
modelId='your-custom-model-id',
contentType='application/json',
body=json.dumps({
'prompt': 'Your input text here',
'temperature': 0.5,
'maxTokens': 200
})
)
Replace your-custom-model-id
with the ID from the Bedrock console. The body
format depends on the base model (e.g., Anthropic Claude, Amazon Titan) used for fine-tuning, so ensure your input matches its schema.
Key considerations include IAM permissions and cost. Verify your AWS role has bedrock:InvokeModel
permissions for the custom model. Monitor usage via CloudWatch, as Bedrock charges per inference request. For integration, test the model in the Bedrock playground in the AWS console first, then adapt the API call to your application code. If you need lower latency or higher throughput, adjust parameters like maxTokens
or use streaming responses. For advanced use cases (e.g., persistent endpoints), you might export the model to SageMaker, but Bedrock’s serverless approach typically eliminates infrastructure management.