Is it possible to get token usage metrics or other usage details from Amazon Bedrock after making a request (to track costs or performance)?

Yes, Amazon Bedrock provides mechanisms to track token usage and other metrics, though the approach depends on the specific model and integration method. Here’s how it works:

1. Built-in Metrics via CloudWatch Bedrock automatically sends usage metrics to AWS CloudWatch, including input/output token counts and inference latency. For example, the InputTokenCount and OutputTokenCount metrics are available for supported models (like Anthropic Claude). You can view these metrics in the CloudWatch console or query them via the AWS CLI/SDK. To use this, enable Bedrock metric streaming to CloudWatch in the AWS Management Console. Note that metrics are aggregated and might not provide per-request granularity, but they are sufficient for cost estimation and high-level performance monitoring.

2. Per-Request Token Counts in API Responses When invoking models directly via Bedrock’s InvokeModel or InvokeModelWithResponseStream APIs, some model providers return token counts in the response headers. For example, Anthropic Claude includes x-amzn-bedrock-input-token-count and x-amzn-bedrock-output-token-count headers. This allows you to log token usage for individual requests programmatically. However, this varies by model provider—AI21 Labs and Cohere models, for instance, do not currently expose token counts in headers. Always check the specific model’s documentation for details.

3. Cost Tracking and Custom Logging For cost tracking, AWS Cost Explorer breaks down Bedrock usage by model and operation (e.g., InvokeModel). You can also create custom logging by wrapping Bedrock API calls in Lambda functions or middleware that extracts token counts (from headers or CloudWatch metrics) and writes them to databases like DynamoDB or observability tools like Datadog. If you use Bedrock’s Knowledge Bases for RAG, the console’s analytics tab provides query-level metrics like retrieved passages and query latency.

Limitations: Token counts are not available for all models (e.g., Stability AI), and real-time per-request tracking requires parsing response headers. For non-supported models, approximate token counting libraries (like those from Hugging Face) can supplement this data, though accuracy may vary.

Your AI Reference Guide
Is it possible to get token usage metrics or other usage details from Amazon Bedrock after making a request (to track costs or performance)?

Is it possible to get token usage metrics or other usage details from Amazon Bedrock after making a request (to track costs or performance)?

Recommended AI Learn Series

VectorDB for GenAI Apps

Share this article

Keep Reading

AI Assistant

Your AI Reference GuideIs it possible to get token usage metrics or other usage details from Amazon Bedrock after making a request (to track costs or performance)?

Is it possible to get token usage metrics or other usage details from Amazon Bedrock after making a request (to track costs or performance)?

Recommended AI Learn Series

VectorDB for GenAI Apps

Share this article

Keep Reading

AI Assistant

Your AI Reference Guide
Is it possible to get token usage metrics or other usage details from Amazon Bedrock after making a request (to track costs or performance)?