Make Claude reason before it answers with chain-of-thought prompting

Ask for the answer and you get a guess; ask for the reasoning first and the answer comes out the other side of a worked-through argument. Chain-of-thought is the move that buys accuracy on anything a person would need a scratchpad for.

Chain-of-thought (CoT) prompting is the technique of asking Claude to lay out its reasoning before it commits to an answer, rather than emitting the conclusion first and rationalising it after. The order is the whole point: tokens Claude has already written condition the tokens that follow, so reasoning that comes first actually shapes the final answer, while reasoning tacked on afterward is just decoration. For any task a human would reach for a scratchpad on — multi-step math, weighing several constraints, untangling a logic puzzle, drafting something with interdependent parts — making the work visible measurably reduces errors.

Why it earns a place in the rotation now. On current models the API-native version of this idea is adaptive thinking, where Claude decides on its own how much to reason based on query complexity and an effort setting. But the manual technique has not gone away: the moment thinking is off — in the chat app, or in any API call where you omit the thinking parameter — CoT is how you get the same uplift. It is the most portable reasoning lever you have, because it lives entirely in the words of the prompt and works on every model.

The bare version. Append “Think step by step” (or, since the word “think” can read as a thinking-mode cue, “reason through this” or “work through it carefully”) to the end of your request. This is the lowest-effort form and it helps, but it leaves the structure of the reasoning entirely up to Claude, which is fine for one-off questions and weak when you need consistency.

The guided version. Spell out the steps you want Claude to walk: “First identify the constraints, then compute each subtotal, then sum them.” You trade some of Claude's latitude for a repeatable shape, which matters when the same prompt runs across many inputs and you want every run to reason the same way. A caution from the current docs: prescriptive steps can underperform a general instruction to reason thoroughly, because Claude's own decomposition often beats a hand-written plan — so reach for guided steps when you need consistency, not when you are chasing peak quality on a single hard problem.

The structured version. Tell Claude to put its working inside <thinking> tags and its conclusion inside <answer> tags. Now the reasoning and the result are mechanically separable: you keep the reasoning for debugging and audit, and your code pulls just the <answer> block. This is the version to use anywhere downstream of the call — a pipeline, a report, an alert classifier — because you get the accuracy of visible reasoning without the prose leaking into the field you actually consume.

The trap to avoid: CoT is not free and it is not universal. It adds latency and tokens, so applying it to lookups or trivial formatting just makes responses slower and more expensive for no gain. The other failure mode is asking for the answer and the reasoning in the wrong order — if the conclusion comes first, the explanation that follows is post-hoc and the accuracy benefit evaporates. Reasoning must precede the answer to do any work. And if you have adaptive or extended thinking enabled, you usually do not need manual CoT on top; pick one rather than stacking them.

The try-it block is a single chat-box prompt: the structured form, with a multi-step problem that rewards showing the work.

Try it in 60 seconds

Paste this into the chat box. It asks for the structured form — reasoning fenced off from the result — on a small problem with enough moving parts that a one-shot guess tends to slip.

Solve this. Put your full step-by-step working inside <thinking>
tags, then give only the final number inside <answer> tags.

A warehouse ships 3 pallets/hour starting at 6am. At 9am a second
line opens shipping 5 pallets/hour. Both run until 2pm. How many
pallets are shipped in total by 2pm?

Run it, then run the same problem again with the scaffold removed — just “...how many pallets by 2pm? Answer with the number only.” Compare the two. The structured run exposes each subtotal, so when something looks off you can see where; the bare run gives you a number and no way to check it. In an API call the <answer> tag is what you would parse, discarding the <thinking> block. With thinking enabled on a current model, drop the manual scaffold and let adaptive thinking do the reasoning instead.

Anthropic: Prompting best practices (thinking & reasoning) · Adaptive thinking (the API-native successor) · Interactive prompt-engineering tutorial (GitHub)