Prefill Claude's response to skip the preamble and lock the output format

The one prompting move that lives in the assistant turn, not the user turn. Seed the first characters of Claude's reply and it continues from there — no "Here's the JSON:", no markdown fence, no apology.

Prefilling is the technique of writing the opening characters of Claude's own reply for it. Every other prompting move you make goes in the user turn; prefilling puts text in the assistant turn, and Claude continues generating from exactly where your text stops. It is the most direct lever you have over output shape, because you are not asking Claude to start a certain way — you are starting it that way and letting it run.

Why it matters for anyone calling the API. The single most annoying failure mode in programmatic use is the chatty preamble: you ask for JSON and get “Sure! Here's the JSON you requested:” wrapped in a ```json fence, and now your parser chokes on the prose. Prompt instructions like “return only JSON” help but don't fully close the gap, because the model still wants to acknowledge you first. Prefilling closes it structurally: if the assistant turn already begins with {, the next token Claude emits is the second character of a JSON object, not a greeting. There is nowhere for the preamble to go.

Force a machine-readable format. Prefill the assistant turn with { for JSON or <result> for XML. Claude picks up mid-structure and never emits the conversational wrapper. Pair it with a stop sequence (} for a flat object) and you get a response you can hand straight to json.loads with no regex pre-clean. For an SRE wiring Claude into a pipeline — classifying alerts, extracting fields from a log line, scoring a diff — this is the difference between a parser you trust and one you defensively wrap in try/except.

Skip the preamble in plain prose, too. The same move works outside structured output. Prefill with the first word of the answer you actually want — The root cause is — and Claude continues the sentence instead of spending three lines restating the question. It is also a steering tool: a short prefill can nudge Claude past a reflexive hedge and into the substance, which is useful when you know the shape of the answer and just want it delivered.

Hold a role or a tag open. In a longer generation, prefilling the assistant turn with a tag like <analysis> or an in-character first line keeps Claude inside the structure or persona you set up, rather than drifting back to a neutral assistant voice on turn two. The prefill acts as a rail the rest of the response runs along.

The trap to avoid: prefilling lives in the API, not the chat box. You can't do it in the Claude app — it requires control of the messages array, so it's an SDK / raw-API technique. Two more sharp edges. First, the prefill string cannot end in trailing whitespace; the API rejects "{ " with a space on the end, so stop at the last non-space character. Second, prefilling is incompatible with extended thinking — if you've turned on a thinking budget, you can't also seed the assistant turn, so pick one. Within those bounds it costs nothing and ships today on every current Claude model.

The try-it block is the canonical case: force clean JSON out of a classification call with a single {.

Try it in 60 seconds

A classification call that returns parseable JSON with no preamble and no markdown fence. The whole trick is the last message in the array: an assistant turn prefilled with {. Run it against any current model.

from anthropic import Anthropic
client = Anthropic()

msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[
        {"role": "user", "content":
            "Classify this alert by severity (critical/high/low) "
            "and name the likely subsystem. Alert: "
            "'p99 latency on checkout-api up 8x, error rate 2%'."},
        # The prefill: Claude continues the JSON from here.
        {"role": "assistant", "content": "{"},
    ],
    stop_sequences=["}"],
)

# Re-add the brace the stop sequence ate, then parse.
print("{" + msg.content[0].text + "}")

Without the prefill, the same call often returns "Here's the classification:" followed by a fenced block. With it, the first character Claude emits is already inside the object. Drop the { prefill and re-run to see the preamble come back — that is the whole argument for the technique in one diff. Note: omit the prefill if you have extended thinking enabled, since the two can't be combined.

Anthropic: Prefill Claude's response · Prompting best practices (current) · Interactive prompt-engineering tutorial (GitHub)