To set parameters like maximum tokens, temperature, or top-p when using a model via AWS Bedrock, you configure them in the API request body specific to the foundation model you’re using. Bedrock supports multiple model providers (e.g., Anthropic Claude, AI21 Labs, Cohere), and each may have slightly different parameter names or formatting requirements. Here’s how to approach it:
1. Structure Your API Request
When invoking a model through Bedrock’s InvokeModel
API, parameters are passed in the request body as a JSON object. For example, with Anthropic’s Claude model, you’d include fields like max_tokens_to_sample
(maximum tokens), temperature
, and top_p
directly in the body. For AI21 Labs’ Jurassic-2 model, the equivalent parameters are maxTokens
, temperature
, and topP
. Always check the model provider’s documentation for exact parameter names. Here’s a Claude example:
{
"prompt": "Your input text here",
"max_tokens_to_sample": 200,
"temperature": 0.5,
"top_p": 0.9
}
2. Parameter Behavior and Best Practices
- Maximum tokens: Controls the length of the generated response. Set this to limit output size, but ensure it’s within the model’s maximum allowed value (e.g., Claude v2 supports up to 4,096 output tokens).
- Temperature: Adjusts randomness (0 = deterministic, 1 = more creative). Lower values produce focused outputs, while higher values encourage diversity.
- Top-p (nucleus sampling): Limits token selection to a probability mass (e.g.,
top_p=0.9
uses the smallest set of tokens summing to ≥90% probability). Use either temperature or top-p, not both, unless the model explicitly supports combining them.
3. Provider-Specific Considerations
For Cohere Command models in Bedrock, parameters like max_tokens
and temperature
follow similar logic but use underscores instead of camelCase. Always validate parameter ranges—for example, AI21’s temperature
allows values between 0–100, while Claude uses 0–1. Use Bedrock’s model-specific API documentation or test with the Bedrock playground in the AWS Console to verify syntax before implementing.
By tailoring these parameters to your use case and testing iteratively, you can balance response quality, creativity, and computational efficiency.