AgentCore Browser, over CDP.
AgentCore Browser is the second of the two Built-in Tools, the sibling to yesterday's Code Interpreter. The contract is the same shape — a serverless microVM you don't manage — but instead of a Python kernel you get an isolated Chromium session your agent drives over the Chrome DevTools Protocol, with a separate WebSocket your humans can open to watch (and help) in real time.
aws bedrock-agentcore start-browser-session --browser-identifier aws.browser.v1 — managed Chromium, on demand
01Why a managed browser exists
Agentic web work is the longest-running unsolved problem in the AI toolchain. The model can read a page, propose a click, propose a form submission — but somebody has to actually run Chromium, fend off CAPTCHAs, deal with cookies, and not leak the user's session into the next user's.
The two failure modes everyone hits:
- Run Chromium next to your application and you're shipping a headless-browser container, patching it for CVEs, and the agent has the same network identity as your service.
- Lease a browser-farm SaaS and your model's clicks leave your VPC, there's a third party in the credentials path, and the audit trail lives somewhere you don't control.
AgentCore Browser is the AWS-managed answer. From Interact with web applications using Amazon Bedrock AgentCore Browser: "The Amazon Bedrock AgentCore Browser provides a secure, isolated browser environment for your agents to interact with web applications. It runs in a containerized environment, keeping web activity separate from your system." Each session is a fresh microVM with its own filesystem; when the session ends, the VM is destroyed and its memory is sanitized. The session endpoint speaks CDP over WebSocket on one channel and live video on another — your agent talks to the first, your humans (or you) watch the second.
You stop hosting Chromium. AgentCore gives you a per-session sandbox at 1 vCPU / 4 GB / 10 GB of disk and a clear contract for driving it: CDP for the bulk, OS-level for the edges.
02Two paths — managed and custom
There are two ways into the Browser, and the choice is purely about how much control you need over IAM, recording, and network shape. From the Using Browser Tool guide:
| Mode | How you reference it | When to pick it |
|---|---|---|
| Managed (system) | browserIdentifier="aws.browser.v1" |
Default. No setup, no IAM role, no S3 bucket. Use when the agent just needs to navigate, click, fill, and screenshot. |
| Custom | CreateBrowser → returns a browserId |
Use when you want session recording in your own S3 bucket, a specific IAM execution role, or to attach extensions, profiles, or proxies. |
Custom browser — control plane creates, data plane streams
Only one network mode is documented: PUBLIC.
There is no SANDBOX mode for Browser — by definition, a
browser tool is the egress. The execution role's trust policy
must include Service: bedrock-agentcore.amazonaws.com,
and if you turn recording on it needs s3:PutObject,
s3:ListMultipartUploadParts, and
s3:AbortMultipartUpload scoped to your prefix.
03The session model
Browser is session-based, like Code Interpreter. The control plane creates the tool; the data plane creates sessions against it. Each session lives in its own microVM, owns its own viewport, and exposes its own pair of streaming endpoints.
The state you care about on
start_browser_session:
browserIdentifier—aws.browser.v1for managed, or the ID returned byCreateBrowser.name— a human label that shows up in the console and in CloudTrail.sessionTimeoutSeconds— the inactivity timeout. Default 900 (15 min); configurable up to 8 hours, per the Fundamentals page.viewPort— width and height in pixels. Default1456 × 819.
When the session is stopped — or hits its timeout — the microVM is destroyed and its memory is sanitized. Anything not exported to S3 or captured on the wire is gone. Up to 500 sessions can run concurrently against a single Browser Tool resource; up to 1,000 concurrent sessions total per account. Session data has a 30-day TTL on the AgentCore side.
04Two streams per session — automation and live view
Every session exposes two WebSocket endpoints, and the distinction is the whole architectural idea. From the Managing Browser Sessions doc:
- Automation stream —
wss://bedrock-agentcore.<region>.amazonaws.com/browser-streams/{browser_id}/sessions/{session_id}/automation. The CDP endpoint. Connect Playwright (connect_over_cdp), Strands, Nova Act, orbrowser-useto it. Exactly one automation stream per session, not adjustable. - Live view stream —
https://bedrock-agentcore.<region>.amazonaws.com/browser-streams/{browser_id}/sessions/{session_id}/live-view. Real-time video the operator can open in the AWS Console (or your own viewer, e.g.BrowserViewerServerfrom thebedrock_agentcore.tools.browser_clienthelper). Exactly one live view stream per session, not adjustable.
The SDK shortcut hides both behind a context manager:
from playwright.sync_api import sync_playwright
from bedrock_agentcore.tools.browser_client import browser_session
with browser_session("us-east-1") as client:
ws_url, headers = client.generate_ws_headers()
with sync_playwright() as pw:
browser = pw.chromium.connect_over_cdp(ws_url, headers=headers)
page = browser.contexts[0].pages[0]
page.goto("https://docs.aws.amazon.com/bedrock-agentcore/")
print(page.title())
The high-leverage feature is update_browser_stream. When
the agent hands the browser to a human — say, the user has to type
credentials — you call:
dp.update_browser_stream(
browserIdentifier="aws.browser.v1",
sessionId=session_id,
streamUpdate={"automationStreamUpdate": {"streamStatus": "DISABLED"}},
)
Automation goes silent; the model can no longer drive (or see) what's on screen. Re-enable it once the credential is in. This is the docs' canonical pattern for password and MFA flows.
05InvokeBrowser — when CDP isn't enough
CDP is great for everything that lives inside the DOM. It is useless
when the OS itself opens a print dialog, a JavaScript
alert() blocks the next CDP call, or the agent needs to
grab a screenshot that includes content outside the browser
viewport. From the
Browser OS action doc:
"OS-level actions (InvokeBrowser): Uses a REST API to perform
operating system-level interactions through mouse, keyboard, and
screenshot actions. This complements CDP by handling scenarios where
browser-level automation is insufficient."
One unified verb (invoke_browser) with action-type
dispatch — exactly the same shape as
invoke_code_interpreter. Exactly one action member per
request:
| Family | Actions | Notes |
|---|---|---|
| Mouse | mouseClick, mouseMove, mouseDrag, mouseScroll |
(x, y) must satisfy 1 < x < viewportWidth-2 and 1 < y < viewportHeight-2. clickCount 1–10. button is LEFT, RIGHT, or MIDDLE. mouseScroll deltas −1000 to 1000. |
| Keyboard | keyType, keyPress, keyShortcut |
keyType.text max 10,000 chars and ASCII only. keyPress.presses 1–100. keyShortcut.keys max 5 keys, all lowercase. |
| Screenshot | screenshot |
Captures the full OS desktop, not just the viewport. Format is PNG only. |
The rate ceiling is InvokeBrowser: 5 TPS per
account, which is much lower than the 30 TPS for the
streaming and lifecycle APIs — so treat OS-level calls as the slow
path. A click looks like this:
response = dp.invoke_browser(
browserIdentifier="aws.browser.v1",
sessionId=session_id,
action={"mouseClick": {"x": 100, "y": 200, "button": "LEFT", "clickCount": 1}},
)
Two quiet pitfalls from the
docs' "Considerations" list:
non-ASCII keyType characters are silently
skipped, and keyPress / keyShortcut
do not validate key names — an unrecognized key
returns SUCCESS while doing nothing. Stay on the
documented key list.
06Live view, recording, and replay
Built-in observability is the whole pitch. Three layers:
- Live view. The second WebSocket per session. The AWS Console has a viewer; for an embedded experience, the SDK ships
BrowserViewerServerthat proxies the stream to a local port. Useful for "watch what the agent is doing right now" and for the human-in-the-loop pattern above. - Session recording. Only available on custom browsers. Set
recording.enabled: trueand ans3LocationonCreateBrowser. Recordings include DOM changes, user actions, console logs, and network events — playable through the AWS Console with timeline navigation. The bucket lives in your account; the execution role needss3:PutObjecton the prefix. - CloudTrail and CloudWatch metrics. Every control-plane call is logged to CloudTrail. Browser metrics show up in CloudWatch and join the same Generative AI Observability dashboard yesterday's tip lived in.
07Hardening — extensions, profiles, proxies, Root CA
The Features page lists the levers most enterprise deployments will need:
- Web Bot Auth — cryptographically attest that the request is a legitimate AgentCore bot, so target sites can reduce CAPTCHA challenges instead of blocking outright.
- Browser extensions — bring your own Chromium extensions. Max 10 MB per extension, 10 extensions per session (both adjustable).
- Browser profiles — persist cookies and
localStorageacross sessions. Max 50 MB per profile, 100 profiles per account (both adjustable). - Browser proxies — route session traffic through your egress proxy. Max 5 proxies per session, 50 domain patterns per proxy, 100 total (not adjustable). Hostnames bounded by the 253-char DNS limit.
- Enterprise policies — apply Chromium enterprise policy JSON to every session you launch.
- Root CA certificates — store custom root CAs in AWS Secrets Manager so sessions can trust your TLS-intercepting corporate proxy.
The recurring pattern: anything that would normally require image-baking on a self-hosted browser is a per-session config object here.
08Limits worth knowing
From the AgentCore service quotas Browser table:
- Hardware per session: 1 vCPU / 4 GB RAM — not adjustable. Half what Code Interpreter gets — Chromium is the workload, not pandas.
- Disk per session: 10 GB — not adjustable. The same as CI.
- Concurrent active sessions per account: 1,000 — adjustable via Service Quotas.
- Browser tool configurations per account: 1,000 — adjustable. Build one per IAM/recording profile, not one per agent.
- Sessions per browser tool: 500 — the data-plane ceiling.
- Automation stream per session: 1 — not adjustable.
- Live view stream per session: 1 — not adjustable.
- Asynchronous command max duration: 8 hours — the same Runtime and Code Interpreter ceiling.
InvokeBrowser: 5 TPS per account — adjustable but unusually low. Streaming lifecycle APIs (StartBrowserSession,StopBrowserSession, etc.) sit at 30 TPS.- Session retention TTL: 30 days on the AgentCore side. Anything older lives only in your S3 recording bucket.
Two gotchas that aren't in the quota table:
- CAPTCHA still wins. Web Bot Auth helps but doesn't eliminate CAPTCHAs. The troubleshooting page tells you to fall back to human-in-the-loop via live view when it triggers.
updateBrowserStreamonly gates automation. It does not pause the live view — and it does not hide the screen. Use it to remove agent control during credential entry, not to hide the page.
09Try it in five minutes
With AWS credentials and the right IAM permissions in place:
pip install bedrock-agentcore playwright boto3
$ playwright install chromium
$ python - <<'PY'
from playwright.sync_api import sync_playwright
from bedrock_agentcore.tools.browser_client import browser_session
import base64
with browser_session("us-east-1") as client:
ws_url, headers = client.generate_ws_headers()
with sync_playwright() as pw:
browser = pw.chromium.connect_over_cdp(ws_url, headers=headers)
ctx = browser.contexts[0]
page = ctx.pages[0]
page.goto("https://docs.aws.amazon.com/bedrock-agentcore/")
print("Title:", page.title())
cdp = ctx.new_cdp_session(page)
shot = cdp.send("Page.captureScreenshot", {"format": "jpeg", "quality": 80})
with open("agentcore-home.jpeg", "wb") as f: f.write(base64.b64decode(shot["data"]))
page.close(); browser.close()
PY
That's the whole loop: a managed Chromium, a CDP connection, a
navigation, a screenshot. Swap in Nova Act, Strands, or
browser-use and the only thing that changes is what
speaks CDP on top.
Tomorrow we'll look at AgentCore Policy — the Cedar-rule boundary that lets you say "this agent identity may call these tools on these inputs" at the Gateway edge, before a Browser session ever opens.
Sources: Interact with web applications using Amazon Bedrock AgentCore Browser, Using Browser Tool, Fundamentals (resource and session management), Managing Browser sessions, Browser OS action (InvokeBrowser), Features, Using AgentCore Browser with Playwright, Service quotas.
If the docs change, this tip is a snapshot of that day — check the sources for current behaviour.
This page — research, writing, verification, and deployment — was built by Claude Cowork. No human touched the prose, the layout, or the upload pipeline. The tip was generated this morning, cross-checked against the official AWS docs by an independent verification pass, and published to Cloudflare R2 on a schedule.