Technology · Claude API
IntermediateClaude Thinking and Effort
A quick reference for enabling adaptive thinking and controlling effort levels in Claude API requests.
- 01Enable adaptive thinking with thinking: {type: 'adaptive'} on Opus 4.8.
- 02Control depth on Fable 5 via output_config.effort instead of thinking.
- 03Never send {type: 'enabled', budget_tokens: N} — it returns a 400 error.
Adaptive Thinking
Adaptive thinking lets Claude reason internally before answering — improving accuracy on hard tasks.
Enable it with
thinking: {"type": "adaptive"}on Opus 4.8 and Opus 4.7.message = client.messages.create( model="claude-opus-4-8", max_tokens=16000, thinking={"type": "adaptive"}, messages=[{"role": "user", "content": "Solve this logic puzzle step by step: ..."}] )"adaptive"is the only valid on-mode for Opus 4.8 and newer —"enabled"returns a 400 error.To turn off thinking on Opus 4.8, use
thinking: {"type": "disabled"}or omit the parameter.# Fast mode — no thinking message = client.messages.create( model="claude-opus-4-8", max_tokens=1024, thinking={"type": "disabled"}, messages=[{"role": "user", "content": "What is 2 + 2?"}] )Set
max_tokenshigh enough to cover both the thinking process and the final answer.Use adaptive thinking for: complex reasoning, multi-step maths, coding problems, strategic planning.
Effort Levels on Fable 5
On Claude Fable 5 (
claude-fable-5), thinking is always on — omit thethinkingparameter.Control reasoning depth with
output_config.effortinstead.message = client.messages.create( model="claude-fable-5", max_tokens=32000, output_config={"effort": "high"}, messages=[{"role": "user", "content": "Design a database schema for a multi-tenant SaaS app."}] )Effort levels and when to use them:
Effort level Best for lowFast responses, simple Q&A, routing tasks mediumGeneral-purpose tasks, summarisation highComplex reasoning, detailed analysis xhighResearch-grade tasks, hard maths, legal review maxHardest tasks where quality trumps speed and cost Sending
thinking: {"type": "enabled", budget_tokens: N}to Fable 5 returns a 400 error.Sending
thinking: {"type": "disabled"}to Fable 5 also returns a 400 error.Default effort when the parameter is omitted is model-defined — specify it explicitly for predictable quality.
Reading Thinking Blocks
When thinking is active,
message.contentcontains thinking blocks before the text block.for block in message.content: if block.type == "thinking": print("Thinking:", block.thinking) # the internal reasoning elif block.type == "text": print("Answer:", block.text)On Opus 4.8, thinking blocks contain the full chain of thought.
On Fable 5, thinking blocks contain a summarised version — the raw chain is never returned.
Control thinking display on Fable 5 via
output_config.thinking_display.output_config={ "effort": "high", "thinking_display": "summarized" # or "omitted" (default) }Pass thinking blocks back unchanged when continuing a conversation — the API validates them.
# Replay: include thinking blocks from the previous assistant turn messages.append({"role": "assistant", "content": previous_message.content})Do not modify thinking block content — the API rejects tampered blocks.
When to Use Thinking
Use adaptive thinking when the task involves multiple logical steps or competing considerations.
Skip thinking for fast, simple tasks — classification, extraction, short answers — to save cost and latency.
Thinking increases output token count significantly — budget at least 8,000–16,000
max_tokens.Streaming with thinking: thinking blocks arrive as
thinking_deltaevents beforetext_deltaevents.with client.messages.stream( model="claude-opus-4-8", max_tokens=8000, thinking={"type": "adaptive"}, messages=[{"role": "user", "content": "Refactor this function..."}] ) as stream: for event in stream: if hasattr(event, 'delta') and event.delta.type == "thinking_delta": print(event.delta.thinking, end="") elif hasattr(event, 'delta') and event.delta.type == "text_delta": print(event.delta.text, end="")For Fable 5, run effort sweeps — test
low,medium, andhighto find the minimum effort that meets quality targets.
Tip: On hard coding or maths tasks, adaptive thinking often halves the error rate compared to non-thinking mode.
Warning: Thinking tokens are billed as output tokens — at $25/M for Opus 4.8. Enable it selectively.