Yes, Amazon Bedrock provides mechanisms to track token usage and other metrics, though the approach depends on the specific model and integration method. Here’s how it works:
1. Built-in Metrics via CloudWatch
Bedrock automatically sends usage metrics to AWS CloudWatch, including input/output token counts and inference latency. For example, the InputTokenCount
and OutputTokenCount
metrics are available for supported models (like Anthropic Claude). You can view these metrics in the CloudWatch console or query them via the AWS CLI/SDK. To use this, enable Bedrock metric streaming to CloudWatch in the AWS Management Console. Note that metrics are aggregated and might not provide per-request granularity, but they are sufficient for cost estimation and high-level performance monitoring.
2. Per-Request Token Counts in API Responses
When invoking models directly via Bedrock’s InvokeModel
or InvokeModelWithResponseStream
APIs, some model providers return token counts in the response headers. For example, Anthropic Claude includes x-amzn-bedrock-input-token-count
and x-amzn-bedrock-output-token-count
headers. This allows you to log token usage for individual requests programmatically. However, this varies by model provider—AI21 Labs and Cohere models, for instance, do not currently expose token counts in headers. Always check the specific model’s documentation for details.
3. Cost Tracking and Custom Logging
For cost tracking, AWS Cost Explorer breaks down Bedrock usage by model and operation (e.g., InvokeModel
). You can also create custom logging by wrapping Bedrock API calls in Lambda functions or middleware that extracts token counts (from headers or CloudWatch metrics) and writes them to databases like DynamoDB or observability tools like Datadog. If you use Bedrock’s Knowledge Bases for RAG, the console’s analytics tab provides query-level metrics like retrieved passages and query latency.
Limitations: Token counts are not available for all models (e.g., Stability AI), and real-time per-request tracking requires parsing response headers. For non-supported models, approximate token counting libraries (like those from Hugging Face) can supplement this data, though accuracy may vary.