Technology · Claude API

Intermediate

Claude Thinking and Effort

A quick reference for enabling adaptive thinking and controlling effort levels in Claude API requests.

TL;DR
  1. 01Enable adaptive thinking with thinking: {type: 'adaptive'} on Opus 4.8.
  2. 02Control depth on Fable 5 via output_config.effort instead of thinking.
  3. 03Never send {type: 'enabled', budget_tokens: N} — it returns a 400 error.

Adaptive Thinking

  • Adaptive thinking lets Claude reason internally before answering — improving accuracy on hard tasks.

  • Enable it with thinking: {"type": "adaptive"} on Opus 4.8 and Opus 4.7.

    message = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=16000,
        thinking={"type": "adaptive"},
        messages=[{"role": "user", "content": "Solve this logic puzzle step by step: ..."}]
    )
    
  • "adaptive" is the only valid on-mode for Opus 4.8 and newer — "enabled" returns a 400 error.

  • To turn off thinking on Opus 4.8, use thinking: {"type": "disabled"} or omit the parameter.

    # Fast mode — no thinking
    message = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=1024,
        thinking={"type": "disabled"},
        messages=[{"role": "user", "content": "What is 2 + 2?"}]
    )
    
  • Set max_tokens high enough to cover both the thinking process and the final answer.

  • Use adaptive thinking for: complex reasoning, multi-step maths, coding problems, strategic planning.

Effort Levels on Fable 5

  • On Claude Fable 5 (claude-fable-5), thinking is always on — omit the thinking parameter.

  • Control reasoning depth with output_config.effort instead.

    message = client.messages.create(
        model="claude-fable-5",
        max_tokens=32000,
        output_config={"effort": "high"},
        messages=[{"role": "user", "content": "Design a database schema for a multi-tenant SaaS app."}]
    )
    
  • Effort levels and when to use them:

    Effort level Best for
    low Fast responses, simple Q&A, routing tasks
    medium General-purpose tasks, summarisation
    high Complex reasoning, detailed analysis
    xhigh Research-grade tasks, hard maths, legal review
    max Hardest tasks where quality trumps speed and cost
  • Sending thinking: {"type": "enabled", budget_tokens: N} to Fable 5 returns a 400 error.

  • Sending thinking: {"type": "disabled"} to Fable 5 also returns a 400 error.

  • Default effort when the parameter is omitted is model-defined — specify it explicitly for predictable quality.

Reading Thinking Blocks

  • When thinking is active, message.content contains thinking blocks before the text block.

    for block in message.content:
        if block.type == "thinking":
            print("Thinking:", block.thinking)  # the internal reasoning
        elif block.type == "text":
            print("Answer:", block.text)
    
  • On Opus 4.8, thinking blocks contain the full chain of thought.

  • On Fable 5, thinking blocks contain a summarised version — the raw chain is never returned.

  • Control thinking display on Fable 5 via output_config.thinking_display.

    output_config={
        "effort": "high",
        "thinking_display": "summarized"  # or "omitted" (default)
    }
    
  • Pass thinking blocks back unchanged when continuing a conversation — the API validates them.

    # Replay: include thinking blocks from the previous assistant turn
    messages.append({"role": "assistant", "content": previous_message.content})
    
  • Do not modify thinking block content — the API rejects tampered blocks.

When to Use Thinking

  • Use adaptive thinking when the task involves multiple logical steps or competing considerations.

  • Skip thinking for fast, simple tasks — classification, extraction, short answers — to save cost and latency.

  • Thinking increases output token count significantly — budget at least 8,000–16,000 max_tokens.

  • Streaming with thinking: thinking blocks arrive as thinking_delta events before text_delta events.

    with client.messages.stream(
        model="claude-opus-4-8",
        max_tokens=8000,
        thinking={"type": "adaptive"},
        messages=[{"role": "user", "content": "Refactor this function..."}]
    ) as stream:
        for event in stream:
            if hasattr(event, 'delta') and event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="")
            elif hasattr(event, 'delta') and event.delta.type == "text_delta":
                print(event.delta.text, end="")
    
  • For Fable 5, run effort sweeps — test low, medium, and high to find the minimum effort that meets quality targets.

Tip: On hard coding or maths tasks, adaptive thinking often halves the error rate compared to non-thinking mode.

Warning: Thinking tokens are billed as output tokens — at $25/M for Opus 4.8. Enable it selectively.

Claude API StreamingClaude API Tool Use