Amazon Bedrock incorporates safe AI practices through a combination of built-in content filtering, customizable moderation tools, and alignment with ethical AI principles. These measures aim to prevent harmful, biased, or inappropriate outputs while giving developers control over model behavior.
First, Bedrock includes pre-processing and post-processing safeguards for inputs and outputs. Foundation models available through Bedrock, such as Claude or Jurassic-2, are trained with safety constraints to avoid generating violent, hateful, or unethical content. Additionally, Bedrock applies automated content filters that screen prompts and responses for policy violations. For example, if a user submits a prompt requesting instructions for hacking a website, the system might block the request entirely or return a refusal response. These filters use pattern-matching and classification techniques to detect prohibited content categories like harassment, self-harm, or misinformation.
Second, Bedrock provides customizable guardrails through its API and console. Developers can define rules to:
- Block specific topics (e.g., gambling, weapons)
- Filter out profanity or sensitive personal information (PII)
- Set toxicity thresholds using predefined safety classifiers For instance, a healthcare app could configure Bedrock to redact medical record numbers in outputs while allowing clinical terminology. These guardrails work alongside the base model’s safety training, giving teams granular control without needing to retrain the underlying AI.
Third, Bedrock supports compliance through audit logging and access controls. All model inputs/outputs can be logged to AWS CloudTrail for review, helping teams monitor for misuse. Bedrock also adheres to AWS’s data privacy standards, ensuring user prompts and model responses aren’t stored or used to improve AWS models. This is critical for industries like finance or healthcare that require strict data governance. Developers can further integrate Bedrock with AWS AI services like Amazon Comprehend for additional content moderation layers, such as detecting personally identifiable information (PII) in generated text.
