Opus 4.5 had binary extended thinking — on or off. Opus 4.6 replaces this with adaptive thinking: a four-level system where the model dynamically calibrates how much cognitive effort to spend on each task.
The Four Effort Levels
graph LR
A[Quick<br/>~500-2K tokens<br/>Simple tasks] --> B[Standard<br/>~2K-10K tokens<br/>Everyday work]
B --> C[Deep<br/>~10K-50K tokens<br/>Complex problems]
C --> D[Maximum<br/>~50K-200K tokens<br/>Hardest challenges]
| Level | Thinking Tokens | Latency | Cost Multiplier | Best For |
|---|---|---|---|---|
| Quick | 500–2K | 1–3s | 1× | Formatting, simple lookups, classification |
| Standard | 2K–10K | 3–10s | 1.5–2× | Code generation, debugging, analysis |
| Deep | 10K–50K | 10–30s | 3–5× | Architecture review, security audits |
| Maximum | 50K–200K | 30–120s | 8–15× | Novel research, exhaustive analysis |
How Adaptive Thinking Decides
When you set thinking: {"type": "adaptive"}, the model reads your prompt and internally classifies the task complexity:
Prompt analysis → Complexity estimation → Effort allocation → Thinking → Response
The model considers:
- Length and complexity of the input
- Specificity of the question
- Number of constraints to satisfy
- Whether the task requires multi-step reasoning
- Ambiguity level of the request
API Configuration
Fully Adaptive (Recommended Default)
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=4096,
thinking={"type": "adaptive"}, # Model chooses effort level
messages=[{"role": "user", "content": prompt}]
)
Explicit Effort Level
# Force deep thinking for a security review
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=8192,
thinking={"type": "adaptive", "effort": "maximum"},
messages=[{"role": "user", "content": security_review_prompt}]
)
Disable Thinking
# Quick responses where thinking adds no value
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=256,
thinking={"type": "none"}, # No thinking tokens
messages=[{"role": "user", "content": "Format this JSON: ..."}]
)
Reading Thinking Output
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=4096,
thinking={"type": "adaptive"},
messages=[{"role": "user", "content": "Find the bug in this function..."}]
)
for block in response.content:
if block.type == "thinking":
print(f"[Thinking — {len(block.thinking)} chars]")
print(block.thinking[:500]) # Preview thinking process
elif block.type == "text":
print(f"\n[Response]")
print(block.text)
When to Override the Automatic Level
The adaptive system is good but not perfect. Override when:
| Situation | Override To | Reason |
|---|---|---|
| Security audit | maximum | Cannot afford to miss vulnerabilities |
| Simple formatting | none | Thinking wastes tokens on trivial tasks |
| Time-sensitive response | quick | Speed matters more than depth |
| Research question | deep or maximum | Thoroughness matters more than speed |
| Batch processing | standard | Consistent quality at predictable cost |
In the next lesson, you will build effort-aware applications that automatically select the right level for each task.