The /invocations contract, on the wire.
Yesterday's tip framed AgentCore Runtime as the
missing primitive between Lambda and a Kubernetes cluster. Today we
zoom into the wire — what the data plane actually expects from your
container, and what shape a single turn of conversation takes when
you bypass the CLI and call InvokeAgentRuntime directly
with boto3 or curl.
/invocations · 0.0.0.0:8080 · ARM64 — two endpoints, one container
01Two endpoints, one container
Runtime's HTTP protocol is a small, opinionated contract. Your container exposes exactly two paths on port 8080:
POST /invocations— the primary agent interaction endpoint. JSON input, JSON or Server-Sent Events output.GET /ping— the liveness probe. JSON body with astatusfield.
That's the whole HTTP surface. WebSockets (/ws) are
optional and opt-in for bidirectional streaming, and live on the
same port. MCP, A2A, and AG-UI agents follow their own protocol
contracts at different paths — but for plain HTTP agents,
/invocations plus /ping is the spec.
Runtime is doing nothing magical at the network layer. It's a managed reverse proxy in front of an ARM64 container that speaks two well-known paths. Once you internalise that, debugging a Runtime deployment stops feeling like a black box.
02Container requirements
Three immovable rules from the docs:
- Host:
0.0.0.0— not127.0.0.1. Runtime calls into your container from outside the loopback. - Port:
8080. Standard port for HTTP-based agent communication. - Platform: ARM64 container. Required for compatibility with the AgentCore Runtime environment.
Run an x86 image and the container fails to start with the classic
exec /bin/sh: exec format error. Bind to
127.0.0.1 and /ping fails because
Runtime's health-checker can't reach you. Listen on 8081 and
nothing ever connects.
03The /invocations request, exactly
The data-plane API is InvokeAgentRuntime. Internally
it forwards HTTP to your container at POST /invocations,
but at the AWS API boundary the URL is:
Reconstruction from the InvokeAgentRuntime reference
Headers the caller may send, all documented on the InvokeAgentRuntime reference:
| Header | Required | Notes |
|---|---|---|
Content-Type | yes | Most agents use application/json. |
Accept | no | MIME type the caller wants back. |
X-Amzn-Bedrock-AgentCore-Runtime-Session-Id | no | The session id. 33–256 characters. Same id across calls = same conversation. |
X-Amzn-Bedrock-AgentCore-Runtime-User-Id | no | Per-user attribution. Adds an extra IAM check (bedrock-agentcore:InvokeAgentRuntimeForUser). |
Mcp-Session-Id, Mcp-Protocol-Version | no | Used by MCP agents. |
X-Amzn-Trace-Id, traceparent, tracestate, baggage | no | Distributed-tracing headers — forwarded into your container so OTEL spans link up. |
Payload size cap: 100,000,000 bytes — the
payload field on the API has Maximum length of
100000000. Plenty for prompts; relevant when you're sending
multi-modal binary blobs.
Session id length matters. The API documents
runtimeSessionId as "Minimum length of 33. Maximum
length of 256." A bare UUID4 (8-4-4-4-12 = 36
chars) clears it; a 32-char hash doesn't. The docs
explicitly recommend a UUID.
04The /invocations response, two shapes
Your container picks the response format per request, signalled by
Content-Type:
JSON (non-streaming) — for quick, deterministic answers.
{"response": "It's 18°C and clear in Cape Town.", "status": "success"}
Server-Sent Events (streaming) — for long-running reasoning, tool loops, or token-by-token chat. The format is the standard SSE shape from the WHATWG spec:
data: {"event": "partial response 1"}
data: {"event": "partial response 2"}
data: {"event": "final response"}
There is no separate "register as streaming" step. You just set
Content-Type: text/event-stream on the response and
start writing data: …\n\n lines.
05Calling it from boto3
The canonical client-side path, from the AWS docs:
import boto3, json, uuid
client = boto3.client("bedrock-agentcore") # one hyphen
session = str(uuid.uuid4()) # 36 chars — clears the 33-min
response = client.invoke_agent_runtime(
agentRuntimeArn = AGENT_ARN,
runtimeSessionId = session,
payload = json.dumps({"prompt": "Hi"}).encode(),
)
if "text/event-stream" in response.get("contentType", ""):
for line in response["response"].iter_lines(chunk_size=10):
if line and line.startswith(b"data: "):
print(line[6:].decode("utf-8"))
else:
chunks = [c.decode("utf-8") for c in response.get("response", [])]
print(json.loads("".join(chunks)))
Things worth memorising:
- The
boto3service name isbedrock-agentcore(one hyphen).bedrock-agent-core(two hyphens) is the second-most-googled trip-up and fails withbotocore.UnknownServiceError. payloadis bytes, not a Python dict.json.dumps(...).encode()is the canonical form in the docs.- The streaming branch reads the body as a botocore
EventStream/StreamingBodywithiter_lines(); the JSON branch reads a list of byte chunks and joins them. Same client, two response shapes.
06/ping, and what HealthyBusy is really for
GET /ping returns JSON with two fields:
{
"status": "Healthy",
"time_of_last_update": 1764201600
}
status has exactly two documented values:
Healthy— the system is ready to accept new work.HealthyBusy— the system is operational but currently busy with async tasks. From the docs: "If your agent needs to process background tasks, you can indicate it with the/pingstatus. If the ping status isHealthyBusy, the runtime session is considered active."
That last sentence is the hidden feature. A HealthyBusy
ping is how a background agent — one that's still grinding on a
long-running job after the user disconnected — tells Runtime "don't
reap me yet." For chat-style agents you'll never need anything
but Healthy. For deep-research agents that finish their
work hours later, HealthyBusy is what keeps the 8-hour
session alive.
time_of_last_update is a Unix timestamp Runtime uses to
gauge how long you've been in the current state.
07OAuth-configured agents return 401, not 403
If you've put your Runtime behind OAuth instead of SigV4, missing credentials get a different status code on purpose:
That WWW-Authenticate header is the OAuth 2.0 discovery
hint — it points a confused client at
GetRuntimeProtectedResourceMetadata, where it can learn
which IdP to talk to. SigV4-configured agents return 403
with an ACCESS_DENIED error and no
WWW-Authenticate header. Same "you can't come in," two
very different protocols on the wire.
Also: if your agent uses OAuth, you can't call it with the
AWS SDK. InvokeAgentRuntime over
boto3 is SigV4-only. OAuth agents must be invoked via a
plain HTTPS request with a bearer token.
08Limits worth knowing
- Session id ≥ 33 chars. Use a UUID and you're fine forever.
- Payload ≤ 100 MB. Per the
payloadlength constraint on the API. - Two paths only. Adding
/healthinstead of/ping, or acceptingGET /invocations, won't get you anywhere — Runtime doesn't probe alternate paths. - ARM64. Still. Build
--platform=linux/arm64. - OAuth ⇒ no
boto3. SDK calls are SigV4. OAuth-protected Runtimes need raw HTTPS with anAuthorization: Bearer …header. - Required IAM permission:
bedrock-agentcore:InvokeAgentRuntime(plus…InvokeAgentRuntimeForUserif you're passingX-Amzn-Bedrock-AgentCore-Runtime-User-Id).
09Try it in five minutes
If yesterday's agentcore deploy already gave you a
Runtime ARN, you can hit it from a Python shell right now:
pip install --upgrade boto3
$ export AGENT_ARN="arn:aws:bedrock-agentcore:us-east-1:…:runtime/HelloAgent"
$ python - <<'PY'
import boto3, json, uuid, os
c = boto3.client("bedrock-agentcore")
r = c.invoke_agent_runtime(
agentRuntimeArn=os.environ["AGENT_ARN"],
runtimeSessionId=str(uuid.uuid4()),
payload=json.dumps({"prompt": "One-sentence joke about ARM64."}).encode(),
)
print("content-type:", r.get("contentType"))
for chunk in r.get("response", []):
print(chunk.decode("utf-8"), end="")
PY
Tomorrow we'll switch surfaces and look at AgentCore Memory — how short-term and long-term memory slot in alongside a Runtime session, and what a namespace actually buys you.
Sources: HTTP protocol contract, InvokeAgentRuntime API reference, Invoke an AgentCore Runtime agent.
If the docs change, this tip is a snapshot of that day — check the sources for current behaviour.
This page — research, writing, verification, and deployment — was built by Claude Cowork. No human touched the prose, the layout, or the upload pipeline. The tip was generated this morning, cross-checked against the official AWS docs by an independent verification pass, and published to Cloudflare R2 on a schedule.