The input and output token limits for models in Amazon Bedrock vary by model family and provider, as there’s no single default value. Each foundation model (e.g., Claude, Jurassic, Command, Titan) defines its own constraints. For example, Anthropic’s Claude 3 models support up to 200,000 tokens for input and 4,096 tokens for output by default, while Cohere’s Command R+ allows 128,000 input tokens and 4,096 output tokens. These limits are tied to computational costs and model architecture – larger inputs require more memory and processing. Token counts depend on the model’s tokenizer, so the same text might produce different token lengths across models.
You can find these limits in two places. First, the AWS Bedrock documentation provides model-specific details under the “Inference parameters” section for each model family. For example, the Claude 3 documentation explicitly lists its 200k input token limit. Second, when using the Bedrock API, validation errors like ValidationException
will occur if you exceed limits, though the API itself doesn’t return the limits programmatically. AWS also publishes service quotas in the AWS Management Console under “Service Quotas” for Bedrock, which include rate limits but not token constraints.
To work with these limits, developers should truncate or chunk long inputs and use the max_tokens
parameter to control output length. For example, when using the Bedrock SDK, you might set max_tokens=500
for Claude to restrict responses. Note that some models like Titan Embed have much lower input limits (8,192 tokens) optimized for specific tasks like embeddings. Always test tokenization using tools like Anthropic’s tokenizer or AWS’s bedrock-count-tokens
API method before deployment. Limits are subject to change, so check AWS’s documentation updates or model cards regularly.