v vanemmerik.ai / aws-ai
Tip of the Day 2026 · 06 · 02 ≈ 9 min read Bedrock AgentCore · Browser

AgentCore Browser, over CDP.

AgentCore Browser is the second of the two Built-in Tools, the sibling to yesterday's Code Interpreter. The contract is the same shape — a serverless microVM you don't manage — but instead of a Python kernel you get an isolated Chromium session your agent drives over the Chrome DevTools Protocol, with a separate WebSocket your humans can open to watch (and help) in real time.

$ aws bedrock-agentcore start-browser-session --browser-identifier aws.browser.v1  — managed Chromium, on demand

01Why a managed browser exists

Agentic web work is the longest-running unsolved problem in the AI toolchain. The model can read a page, propose a click, propose a form submission — but somebody has to actually run Chromium, fend off CAPTCHAs, deal with cookies, and not leak the user's session into the next user's.

The two failure modes everyone hits:

AgentCore Browser is the AWS-managed answer. From Interact with web applications using Amazon Bedrock AgentCore Browser: "The Amazon Bedrock AgentCore Browser provides a secure, isolated browser environment for your agents to interact with web applications. It runs in a containerized environment, keeping web activity separate from your system." Each session is a fresh microVM with its own filesystem; when the session ends, the VM is destroyed and its memory is sanitized. The session endpoint speaks CDP over WebSocket on one channel and live video on another — your agent talks to the first, your humans (or you) watch the second.

The shift

You stop hosting Chromium. AgentCore gives you a per-session sandbox at 1 vCPU / 4 GB / 10 GB of disk and a clear contract for driving it: CDP for the bulk, OS-level for the edges.

02Two paths — managed and custom

There are two ways into the Browser, and the choice is purely about how much control you need over IAM, recording, and network shape. From the Using Browser Tool guide:

ModeHow you reference itWhen to pick it
Managed (system) browserIdentifier="aws.browser.v1" Default. No setup, no IAM role, no S3 bucket. Use when the agent just needs to navigate, click, fill, and screenshot.
Custom CreateBrowser → returns a browserId Use when you want session recording in your own S3 bucket, a specific IAM execution role, or to attach extensions, profiles, or proxies.
~/agent — create_browser.py
# bedrock-agentcore-control = control plane (create/delete browsers) cp = boto3.client("bedrock-agentcore-control", region_name="us-east-1") resp = cp.create_browser(   name="my_custom_browser",   description="Browser for the research agent",   networkConfiguration={"networkMode": "PUBLIC"},   executionRoleArn="arn:aws:iam::111122223333:role/BrowserExecRole",   recording={"enabled": True,     "s3Location": {"bucket": "session-record-111122223333",       "prefix": "replay-data"}},   clientToken=str(uuid.uuid4()), ) browser_id = resp["browserId"]

Custom browser — control plane creates, data plane streams

Only one network mode is documented: PUBLIC. There is no SANDBOX mode for Browser — by definition, a browser tool is the egress. The execution role's trust policy must include Service: bedrock-agentcore.amazonaws.com, and if you turn recording on it needs s3:PutObject, s3:ListMultipartUploadParts, and s3:AbortMultipartUpload scoped to your prefix.

03The session model

Browser is session-based, like Code Interpreter. The control plane creates the tool; the data plane creates sessions against it. Each session lives in its own microVM, owns its own viewport, and exposes its own pair of streaming endpoints.

The state you care about on start_browser_session:

When the session is stopped — or hits its timeout — the microVM is destroyed and its memory is sanitized. Anything not exported to S3 or captured on the wire is gone. Up to 500 sessions can run concurrently against a single Browser Tool resource; up to 1,000 concurrent sessions total per account. Session data has a 30-day TTL on the AgentCore side.

04Two streams per session — automation and live view

Every session exposes two WebSocket endpoints, and the distinction is the whole architectural idea. From the Managing Browser Sessions doc:

The SDK shortcut hides both behind a context manager:

from playwright.sync_api import sync_playwright from bedrock_agentcore.tools.browser_client import browser_session   with browser_session("us-east-1") as client:     ws_url, headers = client.generate_ws_headers()     with sync_playwright() as pw:         browser = pw.chromium.connect_over_cdp(ws_url, headers=headers)         page = browser.contexts[0].pages[0]         page.goto("https://docs.aws.amazon.com/bedrock-agentcore/")         print(page.title())

The high-leverage feature is update_browser_stream. When the agent hands the browser to a human — say, the user has to type credentials — you call:

dp.update_browser_stream(     browserIdentifier="aws.browser.v1",     sessionId=session_id,     streamUpdate={"automationStreamUpdate": {"streamStatus": "DISABLED"}}, )

Automation goes silent; the model can no longer drive (or see) what's on screen. Re-enable it once the credential is in. This is the docs' canonical pattern for password and MFA flows.

05InvokeBrowser — when CDP isn't enough

CDP is great for everything that lives inside the DOM. It is useless when the OS itself opens a print dialog, a JavaScript alert() blocks the next CDP call, or the agent needs to grab a screenshot that includes content outside the browser viewport. From the Browser OS action doc: "OS-level actions (InvokeBrowser): Uses a REST API to perform operating system-level interactions through mouse, keyboard, and screenshot actions. This complements CDP by handling scenarios where browser-level automation is insufficient."

One unified verb (invoke_browser) with action-type dispatch — exactly the same shape as invoke_code_interpreter. Exactly one action member per request:

FamilyActionsNotes
Mouse mouseClick, mouseMove, mouseDrag, mouseScroll (x, y) must satisfy 1 < x < viewportWidth-2 and 1 < y < viewportHeight-2. clickCount 1–10. button is LEFT, RIGHT, or MIDDLE. mouseScroll deltas −1000 to 1000.
Keyboard keyType, keyPress, keyShortcut keyType.text max 10,000 chars and ASCII only. keyPress.presses 1–100. keyShortcut.keys max 5 keys, all lowercase.
Screenshot screenshot Captures the full OS desktop, not just the viewport. Format is PNG only.

The rate ceiling is InvokeBrowser: 5 TPS per account, which is much lower than the 30 TPS for the streaming and lifecycle APIs — so treat OS-level calls as the slow path. A click looks like this:

response = dp.invoke_browser(     browserIdentifier="aws.browser.v1",     sessionId=session_id,     action={"mouseClick": {"x": 100, "y": 200, "button": "LEFT", "clickCount": 1}}, )

Two quiet pitfalls from the docs' "Considerations" list: non-ASCII keyType characters are silently skipped, and keyPress / keyShortcut do not validate key names — an unrecognized key returns SUCCESS while doing nothing. Stay on the documented key list.

06Live view, recording, and replay

Built-in observability is the whole pitch. Three layers:

07Hardening — extensions, profiles, proxies, Root CA

The Features page lists the levers most enterprise deployments will need:

The recurring pattern: anything that would normally require image-baking on a self-hosted browser is a per-session config object here.

08Limits worth knowing

From the AgentCore service quotas Browser table:

Two gotchas that aren't in the quota table:

09Try it in five minutes

With AWS credentials and the right IAM permissions in place:

$ pip install bedrock-agentcore playwright boto3 $ playwright install chromium   $ python - <<'PY' from playwright.sync_api import sync_playwright from bedrock_agentcore.tools.browser_client import browser_session import base64   with browser_session("us-east-1") as client:     ws_url, headers = client.generate_ws_headers()     with sync_playwright() as pw:         browser = pw.chromium.connect_over_cdp(ws_url, headers=headers)         ctx = browser.contexts[0]         page = ctx.pages[0]           page.goto("https://docs.aws.amazon.com/bedrock-agentcore/")         print("Title:", page.title())           cdp = ctx.new_cdp_session(page)         shot = cdp.send("Page.captureScreenshot", {"format": "jpeg", "quality": 80})         with open("agentcore-home.jpeg", "wb") as f: f.write(base64.b64decode(shot["data"]))         page.close(); browser.close() PY

That's the whole loop: a managed Chromium, a CDP connection, a navigation, a screenshot. Swap in Nova Act, Strands, or browser-use and the only thing that changes is what speaks CDP on top.

Tomorrow we'll look at AgentCore Policy — the Cedar-rule boundary that lets you say "this agent identity may call these tools on these inputs" at the Gateway edge, before a Browser session ever opens.

Verified against the official AWS docs on 2026-06-02.
Sources: Interact with web applications using Amazon Bedrock AgentCore Browser, Using Browser Tool, Fundamentals (resource and session management), Managing Browser sessions, Browser OS action (InvokeBrowser), Features, Using AgentCore Browser with Playwright, Service quotas.
If the docs change, this tip is a snapshot of that day — check the sources for current behaviour.
Heads up — this tip is from 2026-06-02. AWS services move fast. Cross-check the AgentCore developer guide before relying on specifics, then come back for today's tip →
C

This page — research, writing, verification, and deployment — was built by Claude Cowork. No human touched the prose, the layout, or the upload pipeline. The tip was generated this morning, cross-checked against the official AWS docs by an independent verification pass, and published to Cloudflare R2 on a schedule.

A daily experiment by Monty van Emmerik · vanemmerik.ai · what is Claude Cowork?