Claude Thinking and Effort

Adaptive Thinking

Adaptive thinking lets Claude reason internally before answering — improving accuracy on hard tasks.

Enable it with thinking: {"type": "adaptive"} on Opus 4.8 and Opus 4.7.

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "Solve this logic puzzle step by step: ..."}]
)

"adaptive" is the only valid on-mode for Opus 4.8 and newer — "enabled" returns a 400 error.

To turn off thinking on Opus 4.8, use thinking: {"type": "disabled"} or omit the parameter.

# Fast mode — no thinking
message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    thinking={"type": "disabled"},
    messages=[{"role": "user", "content": "What is 2 + 2?"}]
)

Set max_tokens high enough to cover both the thinking process and the final answer.
Use adaptive thinking for: complex reasoning, multi-step maths, coding problems, strategic planning.

Effort Levels on Fable 5

On Claude Fable 5 (claude-fable-5), thinking is always on — omit the thinking parameter.

Control reasoning depth with output_config.effort instead.

message = client.messages.create(
    model="claude-fable-5",
    max_tokens=32000,
    output_config={"effort": "high"},
    messages=[{"role": "user", "content": "Design a database schema for a multi-tenant SaaS app."}]
)

Effort levels and when to use them:

Effort level	Best for
`low`	Fast responses, simple Q&A, routing tasks
`medium`	General-purpose tasks, summarisation
`high`	Complex reasoning, detailed analysis
`xhigh`	Research-grade tasks, hard maths, legal review
`max`	Hardest tasks where quality trumps speed and cost

Sending thinking: {"type": "enabled", budget_tokens: N} to Fable 5 returns a 400 error.
Sending thinking: {"type": "disabled"} to Fable 5 also returns a 400 error.
Default effort when the parameter is omitted is model-defined — specify it explicitly for predictable quality.

Reading Thinking Blocks

When thinking is active, message.content contains thinking blocks before the text block.

for block in message.content:
    if block.type == "thinking":
        print("Thinking:", block.thinking)  # the internal reasoning
    elif block.type == "text":
        print("Answer:", block.text)

On Opus 4.8, thinking blocks contain the full chain of thought.
On Fable 5, thinking blocks contain a summarised version — the raw chain is never returned.

Control thinking display on Fable 5 via output_config.thinking_display.

output_config={
    "effort": "high",
    "thinking_display": "summarized"  # or "omitted" (default)
}

Pass thinking blocks back unchanged when continuing a conversation — the API validates them.

# Replay: include thinking blocks from the previous assistant turn
messages.append({"role": "assistant", "content": previous_message.content})

Do not modify thinking block content — the API rejects tampered blocks.

When to Use Thinking

Use adaptive thinking when the task involves multiple logical steps or competing considerations.
Skip thinking for fast, simple tasks — classification, extraction, short answers — to save cost and latency.
Thinking increases output token count significantly — budget at least 8,000–16,000 max_tokens.

Streaming with thinking: thinking blocks arrive as thinking_delta events before text_delta events.

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=8000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "Refactor this function..."}]
) as stream:
    for event in stream:
        if hasattr(event, 'delta') and event.delta.type == "thinking_delta":
            print(event.delta.thinking, end="")
        elif hasattr(event, 'delta') and event.delta.type == "text_delta":
            print(event.delta.text, end="")

For Fable 5, run effort sweeps — test low, medium, and high to find the minimum effort that meets quality targets.

Tip: On hard coding or maths tasks, adaptive thinking often halves the error rate compared to non-thinking mode.

Warning: Thinking tokens are billed as output tokens — at $25/M for Opus 4.8. Enable it selectively.