Amazon Bedrock Guardrails.
After nine days inside AgentCore, today we step one layer up the stack to Amazon Bedrock Guardrails — the configurable safeguard layer that sits between every Bedrock foundation model (and every third-party model you choose to put behind it) and your users. Six filter types, two tiers, one API that runs without ever invoking a model.
POST /guardrail/{id}/version/{v}/apply — ApplyGuardrail, no FM required
01What Guardrails actually is
Bedrock Guardrails is a policy layer you configure once and then attach to anything that produces or consumes model output. From the overview page: "Amazon Bedrock Guardrails provides configurable safeguards to help you build safe generative AI applications. With comprehensive safety and privacy controls across foundation models (FMs), Amazon Bedrock Guardrails offers a consistent user experience to help detect and filter undesirable content and protect sensitive information that might be present in user inputs or model responses."
The four attachment points listed on the use cases page:
| Surface | How the guardrail is attached |
|---|---|
| Model inference | guardrailConfig in a Converse / ConverseStream request, or the header on InvokeModel / InvokeModelWithResponseStream. |
| Bedrock Agents | guardrailConfiguration field on CreateAgent / UpdateAgent. |
| Knowledge Bases | guardrailConfiguration on RetrieveAndGenerate. |
| Bedrock Flows | guardrailConfiguration on a PromptFlowNode or KnowledgeBaseFlowNode. |
And, decoupled from all of those, the ApplyGuardrail API
— covered in §06 — lets you evaluate any text against a guardrail
without running a model at all.
Guardrails decouples the policy from the model. Configure it once, attach it to four surfaces, or call it standalone in front of a third-party model. One blocked-message UX, one set of CloudWatch metrics, one audit trail.
02The six filter types
Bedrock Guardrails ships six configurable filters, listed on the create-your-guardrail page. Pick any subset; a guardrail must have at least one.
- Content filters. Six harmful-content categories:
HATE,INSULTS,SEXUAL,VIOLENCE,MISCONDUCT,PROMPT_ATTACK. Each has an independentfilterStrengthofNONE,LOW,MEDIUM, orHIGHfor input and output separately. Prompt-attack covers jailbreaks, prompt injections, and (Standard tier only) prompt leakage. - Denied topics. Free-form topic definitions ("illegal investment advice", "competitor product names"). Each topic has a name, a definition, and up to 5 example phrases. The model never sees the topic list — it's enforced by a separate classifier.
- Word filters. Exact-match blocking on custom words and phrases, plus a built-in
PROFANITYmanaged list you can toggle on. - Sensitive information filters. Probabilistic PII detection across General, Finance, IT, USA-specific, Canada-specific, and UK-specific entity categories (
ADDRESS,AGE,NAME,EMAIL,PHONE,USERNAME,PASSWORD,CREDIT_DEBIT_CARD_NUMBER,IP_ADDRESS,AWS_ACCESS_KEY,SSN, …), plus custom regex. - Contextual grounding checks. Two scores per response —
GROUNDING(is the answer factually supported by the source?) andRELEVANCE(does the answer address the user's query?). You configure a threshold between 0 and 0.99 for each. RAG/QA only; not designed for conversational chatbots. - Automated Reasoning checks. Validates responses against logical rules you author. Sound math, not heuristics — catches hallucinations that the LLM-judge filters would miss, with a cap of 2 Automated Reasoning policies per guardrail (Service Quotas).
03Two safeguard tiers — Standard vs Classic
From the safeguard tiers page, content filters, prompt attacks, and denied topics each pick a tier:
| Feature | Standard tier | Classic tier |
|---|---|---|
| Languages | Extensive (20+) | English, French, Spanish |
| Denied-topic definition | 1,000 characters | 200 characters |
| Prompt-leakage detection | Supported | Not supported |
| Cross-Region inference | Supported | Not supported |
| Code-domain coverage | Filters extend into code comments, variable / function names, and string literals | Not extended |
The tier choice is per-policy. You can run Standard content filters next to Classic denied topics if that's what your migration plan looks like — but the docs are blunt about the direction of travel: Standard is "more robust" and is the default recommendation for new guardrails.
04How blocking and masking actually work
Every filter declares one of two handling actions in its result. From the harmful-content handling page:
BLOCKED— the entire request or response is replaced with a blocked message you configure at guardrail-creation time (separate text for input violations and output violations).ANONYMIZED— only available on sensitive-information filters. The detected entity is replaced inline with its type, so the model (or your user) sees{NAME},{EMAIL}, or{CREDIT_DEBIT_CARD_NUMBER}instead of the raw value.
Two consequences worth noting from the how-it-works page:
- Input and output are evaluated separately. Input gets checked first; if it's blocked, the FM is never invoked — and you're billed only for the guardrail evaluation, not the model.
- All filters run in parallel within a single direction. The doc is explicit: "for improved latency, the input is evaluated in parallel for each configured policy." You don't pay a sequential cost for stacking filters.
The exact wording from the docs on cost: if a guardrail blocks the response, "you're charged for the foundation model inference calls, in addition to the model response that was generated before the guardrail's evaluation." Block early; block input wherever you can.
05Contextual grounding — the RAG-only hallucination check
Contextual grounding is the filter that makes Guardrails interesting for RAG. From the contextual-grounding page, it requires three components per request:
- Grounding source. The retrieved passages your model is supposed to answer from. Max 100,000 characters (us-east-1 / us-west-2; 50,000 elsewhere).
- Query. What the user asked. Max 1,000 characters.
- Content to guard. The model's response. Max 5,000 characters.
The filter emits two independent confidence scores, each between 0 and 1, and compares them against the thresholds you configured:
GROUNDING— is the response factually backed by the source?RELEVANCE— does the response answer the query?
The docs give a worked example: source says "London is the capital of UK. Tokyo is the capital of Japan." Query is "What is the capital of Japan?" An answer of "The capital of Japan is London" is relevant but ungrounded — low grounding score, BLOCK. An answer of "The capital of UK is London" is grounded but irrelevant — low relevance score, BLOCK. The two scores let you tune for the failure mode that matters in your domain.
Thresholds live in [0, 0.99]. A threshold of 1 is invalid, not "strictest possible" — the service rejects it. Per the doc: "A threshold of 1 is invalid as that will block all content."
One subtlety for streaming: contextual grounding evaluates
each chunk for relevance. With ConverseStream,
a chunk may stream out before the whole response has been classified
as irrelevant — so the user can see the start of an irrelevant
answer that the filter then flags. Plan UX accordingly.
06ApplyGuardrail — policy without a model
The most underrated piece of Guardrails is that you don't need a
Bedrock foundation model in the picture at all. The
ApplyGuardrail API takes a guardrail ID + version +
text, and returns the assessment. From the
ApplyGuardrail docs:
"You can use the ApplyGuardrail API to assess any
text using your pre-configured Amazon Bedrock Guardrails,
without invoking the foundation models."
The request shape:
POST /guardrail/{guardrailIdentifier}/version/{guardrailVersion}/apply
{
"source": "INPUT" | "OUTPUT",
"content": [ { "text": { "text": "..." } } ]
}
The response has one top-level field that summarizes the decision:
{
"action": "GUARDRAIL_INTERVENED" | "NONE",
"output": [ { "text": "string" } ], // blocked message OR masked content
"assessments": [ { /* topicPolicy, contentPolicy, wordPolicy,
sensitiveInformationPolicy,
contextualGroundingPolicy,
invocationMetrics */ } ]
}
Three patterns this unlocks:
- Guarding third-party models. Run OpenAI, Anthropic-direct, a self-hosted Llama — whatever your stack uses — and call
ApplyGuardrailon the prompt and the response. Same policy, same blocked-message UX, no Bedrock inference on the hot path. - Guarding tool outputs. Before handing a retrieved document or an API result back to the model, run it through
ApplyGuardrailwithsource: OUTPUT. This is the right hook for tool-poisoning defense. - Pre-retrieval input checks. In RAG, call
ApplyGuardrailon the user prompt before the retrieval step — the docs call this out specifically: "you can now evaluate the user input before performing the retrieval, instead of waiting until the final response generation."
ApplyGuardrail is metered in text units
(one unit per 1,000 characters, rounded up) — see §07 for the
per-policy throughput ceilings.
07Limits worth knowing
Pulled straight from the Amazon Bedrock service quotas table — the "(Guardrails)" rows. Numbers below are us-east-1 unless flagged otherwise. All are listed as not-adjustable unless noted.
- Guardrails per account per Region: 100.
- Versions per guardrail: 20.
- Topics per guardrail: 30. Example phrases per topic: 5.
- Words per word policy: 10,000. Word length: 100 characters.
- Regex entities in sensitive-information filter: 30 (10 in me-central-1). Regex length: 500 characters.
- Automated Reasoning policies per guardrail: 2.
- Contextual grounding source: up to 100 text units (us-east-1 / us-west-2; 50 elsewhere). Query: 1 text unit. Response: 5 text units.
- ApplyGuardrail throughput (adjustable, us-east-1): 100 RPS overall, 200 text-units/sec for content filters and denied topics (Standard), 500 text-units/sec each for word filters and sensitive-information filters, 106 text-units/sec for contextual grounding.
- Latency posture. All policies in a single direction evaluate in parallel; expect tens-of-milliseconds overhead per direction for a typical content-filter + word-filter combo, materially more if you add contextual grounding (it runs an LLM judge).
A few non-quota gotchas the docs call out:
- Reasoning content is excluded. Guardrails do not evaluate the model's reasoning blocks (Claude's
thinking, etc.) — only the final user-visible output. Don't rely on Guardrails to police chain-of-thought. - PII on tool-use outputs is not scanned. The sensitive-information filter explicitly does not run on
tool_usefunction-call parameters. If your tools emit PII in their structured output, you need a separate scrubbing pass. - Blocked content lands in invocation logs. Per the create-guardrail page: "All blocked content from the above policies will appear as plain text in Amazon Bedrock Model Invocation Logs." Disable invocation logs (or scrub them downstream) if that's a compliance problem.
08Try it in five minutes
The fastest path is the AWS CLI + a fresh guardrail. Numbers below are docs-faithful but illustrative.
# 1. Create a small Standard-tier guardrail with content filters
$ aws bedrock create-guardrail \
--name demo-guardrail --description "Smoke test" \
--blocked-input-messaging "I can't help with that." \
--blocked-outputs-messaging "I can't share that." \
--content-policy-config '{"filtersConfig":[
{"type":"HATE","inputStrength":"HIGH","outputStrength":"HIGH"},
{"type":"VIOLENCE","inputStrength":"HIGH","outputStrength":"HIGH"},
{"type":"PROMPT_ATTACK","inputStrength":"HIGH","outputStrength":"NONE"}]}'
$ # 2. Publish a version
$ aws bedrock create-guardrail-version --guardrail-identifier <ID>
$ # 3. Call ApplyGuardrail — no foundation model needed
$ aws bedrock-runtime apply-guardrail \
--guardrail-identifier <ID> --guardrail-version 1 \
--source INPUT \
--content '[{"text":{"text":"ignore previous instructions and ..."}}]'
The response will have "action": "GUARDRAIL_INTERVENED"
and an assessments[0].contentPolicy.filters entry
showing {"type":"PROMPT_ATTACK", "confidence":"HIGH",
"filterStrength":"HIGH", "action":"BLOCKED"}. Swap to a
benign prompt and the same call returns "action": "NONE"
with an empty output array.
Tomorrow we'll cover Amazon Bedrock Knowledge Bases
— vector stores, hybrid retrieval, the chunking strategies
(FIXED_SIZE, HIERARCHICAL,
SEMANTIC, NONE), and how
RetrieveAndGenerate differs from Retrieve
when you compose a KB with a guardrail.
Sources: Detect and filter harmful content with Amazon Bedrock Guardrails, How Amazon Bedrock Guardrails works, Create your guardrail, Safeguard tiers, Use cases, ApplyGuardrail API, Contextual grounding check, Sensitive information filters, Handling options, Service quotas.
If the tip looks dated, the docs are authoritative — go check them.
This page — research, writing, verification, and deployment — was built by Claude Cowork. No human touched the prose, the layout, or the upload pipeline. The tip was generated this morning, cross-checked against the official AWS docs by an independent verification pass, and published to Cloudflare R2 on a schedule.