Budget constraints vám dávají kontrolu nad tím, kolik model “přemýšlí”. Správné nastavení je klíčové.
API Budget Configuration
from anthropic import Anthropic
client = Anthropic()
# Základní konfigurace
response = client.messages.create(
model="claude-opus-4-5-20250101",
max_tokens=8000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # hard limit
},
messages=[...]
)
Budget levels
Minimal (1,000-2,000 tokens)
thinking={"type": "enabled", "budget_tokens": 2000}
Vhodné pro:
- Jednoduché otázky kde chcete vidět reasoning
- Validace odpovědí
- Quick sanity checks
Rizika:
- Může přerušit uprostřed důležité analýzy
- Neúplné závěry
Standard (5,000-10,000 tokens)
thinking={"type": "enabled", "budget_tokens": 8000}
Vhodné pro:
- Běžné coding úlohy
- Code review
- Většina daily tasks
Sweet spot pro většinu použití.
Deep (20,000-30,000 tokens)
thinking={"type": "enabled", "budget_tokens": 25000}
Vhodné pro:
- Komplexní debugging
- Architektonické rozhodnutí
- Security audit
Exhaustive (50,000+ tokens)
thinking={"type": "enabled", "budget_tokens": 60000}
Vhodné pro:
- Matematické důkazy
- Kritické rozhodnutí
- Když standardní budget nestačil
Handling budget exhaustion
response = client.messages.create(
thinking={"type": "enabled", "budget_tokens": 5000},
messages=[...]
)
# Check jestli model použil celý budget
thinking_block = next(
(b for b in response.content if b.type == "thinking"),
None
)
if thinking_block:
tokens_used = count_tokens(thinking_block.thinking)
if tokens_used >= 4800: # blízko limitu
print("Warning: Model possibly truncated thinking")
# Consider retry s vyšším budgetem
Adaptive budgeting
class AdaptiveBudget:
def __init__(self, initial=5000, max_budget=50000):
self.current = initial
self.max = max_budget
def get_budget(self):
return self.current
def increase(self, factor=2):
self.current = min(self.current * factor, self.max)
return self.current
def reset(self):
self.current = 5000
# Použití
budget = AdaptiveBudget()
response = call_with_budget(budget.get_budget())
while not satisfactory(response) and budget.current < budget.max:
new_budget = budget.increase()
print(f"Increasing budget to {new_budget}")
response = call_with_budget(new_budget)
Per-task budget templates
TASK_BUDGETS = {
# Development
"code_completion": 3000,
"code_review": 8000,
"debugging": 15000,
"architecture": 25000,
# Content
"writing": 5000,
"editing": 3000,
"translation": 5000,
# Analysis
"data_analysis": 10000,
"security_audit": 30000,
"research": 20000,
}
def get_task_budget(task_type):
return TASK_BUDGETS.get(task_type, 8000)
Monitoring & alerts
import logging
def monitored_call(prompt, task_type):
budget = get_task_budget(task_type)
response = client.messages.create(
thinking={"type": "enabled", "budget_tokens": budget},
messages=[{"role": "user", "content": prompt}]
)
# Log usage
thinking_tokens = get_thinking_tokens(response)
output_tokens = response.usage.output_tokens
cost = calculate_cost(
input_tokens=response.usage.input_tokens,
thinking_tokens=thinking_tokens,
output_tokens=output_tokens
)
logging.info(f"Task: {task_type}, Cost: ${cost:.4f}")
if cost > 1.00:
logging.warning(f"High cost alert: ${cost:.2f}")
return response
Správné nastavení budgetů je balance mezi kvalitou a náklady.