Tip of the Day 2026 · 06 · 02 ≈ 9 min read Bedrock AgentCore · Browser

AgentCore Browser, over CDP.

AgentCore Browser is the second of the two Built-in Tools, the sibling to yesterday's Code Interpreter. The contract is the same shape — a serverless microVM you don't manage — but instead of a Python kernel you get an isolated Chromium session your agent drives over the Chrome DevTools Protocol, with a separate WebSocket your humans can open to watch (and help) in real time.

$ aws bedrock-agentcore start-browser-session --browser-identifier aws.browser.v1 — managed Chromium, on demand

01Why a managed browser exists

Agentic web work is the longest-running unsolved problem in the AI toolchain. The model can read a page, propose a click, propose a form submission — but somebody has to actually run Chromium, fend off CAPTCHAs, deal with cookies, and not leak the user's session into the next user's.

The two failure modes everyone hits:

Run Chromium next to your application and you're shipping a headless-browser container, patching it for CVEs, and the agent has the same network identity as your service.
Lease a browser-farm SaaS and your model's clicks leave your VPC, there's a third party in the credentials path, and the audit trail lives somewhere you don't control.

AgentCore Browser is the AWS-managed answer. From Interact with web applications using Amazon Bedrock AgentCore Browser: "The Amazon Bedrock AgentCore Browser provides a secure, isolated browser environment for your agents to interact with web applications. It runs in a containerized environment, keeping web activity separate from your system." Each session is a fresh microVM with its own filesystem; when the session ends, the VM is destroyed and its memory is sanitized. The session endpoint speaks CDP over WebSocket on one channel and live video on another — your agent talks to the first, your humans (or you) watch the second.

The shift

You stop hosting Chromium. AgentCore gives you a per-session sandbox at 1 vCPU / 4 GB / 10 GB of disk and a clear contract for driving it: CDP for the bulk, OS-level for the edges.

02Two paths — managed and custom

There are two ways into the Browser, and the choice is purely about how much control you need over IAM, recording, and network shape. From the Using Browser Tool guide:

Mode	How you reference it	When to pick it
Managed (system)	`browserIdentifier="aws.browser.v1"`	Default. No setup, no IAM role, no S3 bucket. Use when the agent just needs to navigate, click, fill, and screenshot.
Custom	`CreateBrowser` → returns a `browserId`	Use when you want session recording in your own S3 bucket, a specific IAM execution role, or to attach extensions, profiles, or proxies.

~/agent — create_browser.py

# bedrock-agentcore-control = control plane (create/delete browsers) cp = boto3.client("bedrock-agentcore-control", region_name="us-east-1") resp = cp.create_browser( name="my_custom_browser", description="Browser for the research agent", networkConfiguration={"networkMode": "PUBLIC"}, executionRoleArn="arn:aws:iam::111122223333:role/BrowserExecRole", recording={"enabled": True, "s3Location": {"bucket": "session-record-111122223333", "prefix": "replay-data"}}, clientToken=str(uuid.uuid4()), ) browser_id = resp["browserId"]

Custom browser — control plane creates, data plane streams

Only one network mode is documented: PUBLIC. There is no SANDBOX mode for Browser — by definition, a browser tool is the egress. The execution role's trust policy must include Service: bedrock-agentcore.amazonaws.com, and if you turn recording on it needs s3:PutObject, s3:ListMultipartUploadParts, and s3:AbortMultipartUpload scoped to your prefix.

03The session model

Browser is session-based, like Code Interpreter. The control plane creates the tool; the data plane creates sessions against it. Each session lives in its own microVM, owns its own viewport, and exposes its own pair of streaming endpoints.

The state you care about on start_browser_session:

browserIdentifier — aws.browser.v1 for managed, or the ID returned by CreateBrowser.
name — a human label that shows up in the console and in CloudTrail.
sessionTimeoutSeconds — the inactivity timeout. Default 900 (15 min); configurable up to 8 hours, per the Fundamentals page.
viewPort — width and height in pixels. Default 1456 × 819.

When the session is stopped — or hits its timeout — the microVM is destroyed and its memory is sanitized. Anything not exported to S3 or captured on the wire is gone. Up to 500 sessions can run concurrently against a single Browser Tool resource; up to 1,000 concurrent sessions total per account. Session data has a 30-day TTL on the AgentCore side.

04Two streams per session — automation and live view

Every session exposes two WebSocket endpoints, and the distinction is the whole architectural idea. From the Managing Browser Sessions doc:

Automation stream — wss://bedrock-agentcore.<region>.amazonaws.com/browser-streams/{browser_id}/sessions/{session_id}/automation. The CDP endpoint. Connect Playwright (connect_over_cdp), Strands, Nova Act, or browser-use to it. Exactly one automation stream per session, not adjustable.
Live view stream — https://bedrock-agentcore.<region>.amazonaws.com/browser-streams/{browser_id}/sessions/{session_id}/live-view. Real-time video the operator can open in the AWS Console (or your own viewer, e.g. BrowserViewerServer from the bedrock_agentcore.tools.browser_client helper). Exactly one live view stream per session, not adjustable.

The SDK shortcut hides both behind a context manager:

from playwright.sync_api import sync_playwright from bedrock_agentcore.tools.browser_client import browser_session with browser_session("us-east-1") as client: ws_url, headers = client.generate_ws_headers() with sync_playwright() as pw: browser = pw.chromium.connect_over_cdp(ws_url, headers=headers) page = browser.contexts[0].pages[0] page.goto("https://docs.aws.amazon.com/bedrock-agentcore/") print(page.title())

The high-leverage feature is update_browser_stream. When the agent hands the browser to a human — say, the user has to type credentials — you call:

dp.update_browser_stream( browserIdentifier="aws.browser.v1", sessionId=session_id, streamUpdate={"automationStreamUpdate": {"streamStatus": "DISABLED"}}, )

Automation goes silent; the model can no longer drive (or see) what's on screen. Re-enable it once the credential is in. This is the docs' canonical pattern for password and MFA flows.

05InvokeBrowser — when CDP isn't enough

CDP is great for everything that lives inside the DOM. It is useless when the OS itself opens a print dialog, a JavaScript alert() blocks the next CDP call, or the agent needs to grab a screenshot that includes content outside the browser viewport. From the Browser OS action doc: "OS-level actions (InvokeBrowser): Uses a REST API to perform operating system-level interactions through mouse, keyboard, and screenshot actions. This complements CDP by handling scenarios where browser-level automation is insufficient."

One unified verb (invoke_browser) with action-type dispatch — exactly the same shape as invoke_code_interpreter. Exactly one action member per request:

Family	Actions	Notes
Mouse	`mouseClick`, `mouseMove`, `mouseDrag`, `mouseScroll`	`(x, y)` must satisfy `1 < x < viewportWidth-2` and `1 < y < viewportHeight-2`. `clickCount` 1–10. `button` is `LEFT`, `RIGHT`, or `MIDDLE`. `mouseScroll` deltas −1000 to 1000.
Keyboard	`keyType`, `keyPress`, `keyShortcut`	`keyType.text` max 10,000 chars and ASCII only. `keyPress.presses` 1–100. `keyShortcut.keys` max 5 keys, all lowercase.
Screenshot	`screenshot`	Captures the full OS desktop, not just the viewport. Format is PNG only.

The rate ceiling is InvokeBrowser: 5 TPS per account, which is much lower than the 30 TPS for the streaming and lifecycle APIs — so treat OS-level calls as the slow path. A click looks like this:

response = dp.invoke_browser( browserIdentifier="aws.browser.v1", sessionId=session_id, action={"mouseClick": {"x": 100, "y": 200, "button": "LEFT", "clickCount": 1}}, )

Two quiet pitfalls from the docs' "Considerations" list: non-ASCII keyType characters are silently skipped, and keyPress / keyShortcut do not validate key names — an unrecognized key returns SUCCESS while doing nothing. Stay on the documented key list.

06Live view, recording, and replay

Built-in observability is the whole pitch. Three layers:

Live view. The second WebSocket per session. The AWS Console has a viewer; for an embedded experience, the SDK ships BrowserViewerServer that proxies the stream to a local port. Useful for "watch what the agent is doing right now" and for the human-in-the-loop pattern above.
Session recording. Only available on custom browsers. Set recording.enabled: true and an s3Location on CreateBrowser. Recordings include DOM changes, user actions, console logs, and network events — playable through the AWS Console with timeline navigation. The bucket lives in your account; the execution role needs s3:PutObject on the prefix.
CloudTrail and CloudWatch metrics. Every control-plane call is logged to CloudTrail. Browser metrics show up in CloudWatch and join the same Generative AI Observability dashboard yesterday's tip lived in.

07Hardening — extensions, profiles, proxies, Root CA

The Features page lists the levers most enterprise deployments will need:

Web Bot Auth — cryptographically attest that the request is a legitimate AgentCore bot, so target sites can reduce CAPTCHA challenges instead of blocking outright.
Browser extensions — bring your own Chromium extensions. Max 10 MB per extension, 10 extensions per session (both adjustable).
Browser profiles — persist cookies and localStorage across sessions. Max 50 MB per profile, 100 profiles per account (both adjustable).
Browser proxies — route session traffic through your egress proxy. Max 5 proxies per session, 50 domain patterns per proxy, 100 total (not adjustable). Hostnames bounded by the 253-char DNS limit.
Enterprise policies — apply Chromium enterprise policy JSON to every session you launch.
Root CA certificates — store custom root CAs in AWS Secrets Manager so sessions can trust your TLS-intercepting corporate proxy.

The recurring pattern: anything that would normally require image-baking on a self-hosted browser is a per-session config object here.

08Limits worth knowing

From the AgentCore service quotas Browser table:

Hardware per session: 1 vCPU / 4 GB RAM — not adjustable. Half what Code Interpreter gets — Chromium is the workload, not pandas.
Disk per session: 10 GB — not adjustable. The same as CI.
Concurrent active sessions per account: 1,000 — adjustable via Service Quotas.
Browser tool configurations per account: 1,000 — adjustable. Build one per IAM/recording profile, not one per agent.
Sessions per browser tool: 500 — the data-plane ceiling.
Automation stream per session: 1 — not adjustable.
Live view stream per session: 1 — not adjustable.
Asynchronous command max duration: 8 hours — the same Runtime and Code Interpreter ceiling.
InvokeBrowser: 5 TPS per account — adjustable but unusually low. Streaming lifecycle APIs (StartBrowserSession, StopBrowserSession, etc.) sit at 30 TPS.
Session retention TTL: 30 days on the AgentCore side. Anything older lives only in your S3 recording bucket.

Two gotchas that aren't in the quota table:

CAPTCHA still wins. Web Bot Auth helps but doesn't eliminate CAPTCHAs. The troubleshooting page tells you to fall back to human-in-the-loop via live view when it triggers.
updateBrowserStream only gates automation. It does not pause the live view — and it does not hide the screen. Use it to remove agent control during credential entry, not to hide the page.

09Try it in five minutes

With AWS credentials and the right IAM permissions in place:

$ pip install bedrock-agentcore playwright boto3 $ playwright install chromium $ python - <<'PY' from playwright.sync_api import sync_playwright from bedrock_agentcore.tools.browser_client import browser_session import base64 with browser_session("us-east-1") as client: ws_url, headers = client.generate_ws_headers() with sync_playwright() as pw: browser = pw.chromium.connect_over_cdp(ws_url, headers=headers) ctx = browser.contexts[0] page = ctx.pages[0] page.goto("https://docs.aws.amazon.com/bedrock-agentcore/") print("Title:", page.title()) cdp = ctx.new_cdp_session(page) shot = cdp.send("Page.captureScreenshot", {"format": "jpeg", "quality": 80}) with open("agentcore-home.jpeg", "wb") as f: f.write(base64.b64decode(shot["data"])) page.close(); browser.close() PY

That's the whole loop: a managed Chromium, a CDP connection, a navigation, a screenshot. Swap in Nova Act, Strands, or browser-use and the only thing that changes is what speaks CDP on top.

Tomorrow we'll look at AgentCore Policy — the Cedar-rule boundary that lets you say "this agent identity may call these tools on these inputs" at the Gateway edge, before a Browser session ever opens.

✓Verified against the official AWS docs on 2026-06-02.
Sources: Interact with web applications using Amazon Bedrock AgentCore Browser, Using Browser Tool, Fundamentals (resource and session management), Managing Browser sessions, Browser OS action (InvokeBrowser), Features, Using AgentCore Browser with Playwright, Service quotas.
If the docs change, this tip is a snapshot of that day — check the sources for current behaviour.

This page — research, writing, verification, and deployment — was built by Claude Cowork. No human touched the prose, the layout, or the upload pipeline. The tip was generated this morning, cross-checked against the official AWS docs by an independent verification pass, and published to Cloudflare R2 on a schedule.

A daily experiment by Monty van Emmerik · vanemmerik.ai · what is Claude Cowork?