Put long inputs at the top of the prompt and your question at the bottom
A one-rule structural change to long-context prompts that Anthropic measures at up to ~30% better response quality on complex multi-document tasks. Costs nothing.
Anthropic's prompt-engineering guide has a small structural rule for long-context prompts: put the long inputs first, the instruction last, and the user's actual question dead last. The model reads top to bottom and attends most sharply to whatever sits closest to where it starts generating. Putting a 50-page PDF after the question is the long-context equivalent of asking a colleague a question, walking them into a library, and expecting them to remember what you asked.
Why this matters more now than it used to. Claude's working context is routinely 100K-plus tokens — a few thousand lines of source, a full incident timeline, all of a service's runbooks at once. With contexts that big, placement stops being cosmetic. Anthropic's long-context tips page puts a number on it: up to roughly 30% better response quality on complex multi-document tasks when the query sits at the bottom. That is the rare prompt-engineering change where the cost is zero and the measured upside is large.
The shape: documents first, instruction last. The recipe is: all reference inputs first (each in its own <document> block with <source> and <document_content> subtags), then any few-shot examples, then the instruction, then the question. The order is “stable, reference-like material first; volatile, task-specific material last.” For an SRE asking Claude to root-cause across three log files and a runbook, that means all four artifacts paste in at the top, the question goes in last, and the model is not second-guessing which paragraph was the ask.
Name each input so Claude can cite it. When there are multiple documents, give them names. <document index="1"><source>postmortem-2025-11.md</source><document_content>…</document_content></document> isn't ceremony — it gives Claude a stable handle to cite (“from postmortem-2025-11.md, the failure mode was…”) and gives you a clean grep target when you go reading the response. Claude will reference sources by the name you gave them, which is the cheapest possible step toward auditable answers.
Ask for quotes before the answer. The second move from the same docs page: tell Claude to quote the relevant passages first, then answer. “Find and quote the log lines that mention timeouts, then explain what they show.” The quotes function as a working set the model attends to without re-scanning the whole context, and they double as a citation trail for the reader. This single sentence in your instructions is the largest hallucination-reducer Anthropic recommends for long-context prompts — cheaper than RAG, cheaper than chunking, and you get the receipts inline.
The trap to avoid is the inverted shape that feels natural in a chat UI: “Here's my question: … [pastes 80KB of YAML].” It reads top-down the way you'd ask a person, but it puts the instruction far from the generation point and buries it under the noise of the input. If you find yourself typing “given the following:” before pasting, stop, move the input up, and end with the question. Same content, completely different signal-to-noise from Claude's vantage point.
One nuance worth knowing: this is a long-context rule, not a universal one. For a 200-token prompt the placement effect is invisible. The 30% number lives in the world of multi-document RAG, long log triage, and reasoning over whole codebases — the prompts where structure starts paying for itself. See the try-it block for the minimal before/after.
The same prompt twice, with the input and the question swapped. Run both in the Claude app or any SDK call against a real long document — a runbook, a log file, anything in the tens of KB — and compare the answers.
Inverted (don't do this):
Find and quote the log lines that mention timeouts, then
explain the failure mode and which runbook step applies.
Here are the inputs:
runbook-payments.md:
...full runbook text...
error.log:
...50KB of log lines...
Right shape:
<documents>
<document index="1">
<source>runbook-payments.md</source>
<document_content>
...full runbook text...
</document_content>
</document>
<document index="2">
<source>error.log</source>
<document_content>
...50KB of log lines...
</document_content>
</document>
</documents>
Find and quote the log lines that mention timeouts, then
explain the failure mode and which runbook step applies.
Cite each document by its source name.
The second shape gives you named citations, an extraction step before the synthesis step, and the instruction adjacent to where Claude starts generating. The difference is usually visible in the first three sentences of the response.