Lesson 1 of 46 ~25 min
Course progress
0%

Adaptive Thinking Fundamentals

Understand how Opus 4.6's adaptive thinking system works — four effort levels, automatic calibration, and the speed-intelligence-cost triangle.

Opus 4.5 had binary extended thinking — on or off. Opus 4.6 replaces this with adaptive thinking: a four-level system where the model dynamically calibrates how much cognitive effort to spend on each task.

The Four Effort Levels

graph LR
    A[Quick<br/>~500-2K tokens<br/>Simple tasks] --> B[Standard<br/>~2K-10K tokens<br/>Everyday work]
    B --> C[Deep<br/>~10K-50K tokens<br/>Complex problems]
    C --> D[Maximum<br/>~50K-200K tokens<br/>Hardest challenges]
LevelThinking TokensLatencyCost MultiplierBest For
Quick500–2K1–3sFormatting, simple lookups, classification
Standard2K–10K3–10s1.5–2×Code generation, debugging, analysis
Deep10K–50K10–30s3–5×Architecture review, security audits
Maximum50K–200K30–120s8–15×Novel research, exhaustive analysis

How Adaptive Thinking Decides

When you set thinking: {"type": "adaptive"}, the model reads your prompt and internally classifies the task complexity:

Prompt analysis → Complexity estimation → Effort allocation → Thinking → Response

The model considers:

  • Length and complexity of the input
  • Specificity of the question
  • Number of constraints to satisfy
  • Whether the task requires multi-step reasoning
  • Ambiguity level of the request

API Configuration

response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=4096,
    thinking={"type": "adaptive"},  # Model chooses effort level
    messages=[{"role": "user", "content": prompt}]
)

Explicit Effort Level

# Force deep thinking for a security review
response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=8192,
    thinking={"type": "adaptive", "effort": "maximum"},
    messages=[{"role": "user", "content": security_review_prompt}]
)

Disable Thinking

# Quick responses where thinking adds no value
response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=256,
    thinking={"type": "none"},  # No thinking tokens
    messages=[{"role": "user", "content": "Format this JSON: ..."}]
)

Reading Thinking Output

response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "Find the bug in this function..."}]
)

for block in response.content:
    if block.type == "thinking":
        print(f"[Thinking — {len(block.thinking)} chars]")
        print(block.thinking[:500])  # Preview thinking process
    elif block.type == "text":
        print(f"\n[Response]")
        print(block.text)

When to Override the Automatic Level

The adaptive system is good but not perfect. Override when:

SituationOverride ToReason
Security auditmaximumCannot afford to miss vulnerabilities
Simple formattingnoneThinking wastes tokens on trivial tasks
Time-sensitive responsequickSpeed matters more than depth
Research questiondeep or maximumThoroughness matters more than speed
Batch processingstandardConsistent quality at predictable cost

In the next lesson, you will build effort-aware applications that automatically select the right level for each task.