The 1M context window is powerful but expensive. This lesson teaches you to manage costs without sacrificing the benefits of large-context processing.
Pricing Tiers
Opus 4.6 uses tiered pricing based on context size:
| Context Size | Input Cost | Output Cost | Notes |
|---|---|---|---|
| ≤200K tokens | $5 / 1M | $25 / 1M | Standard pricing |
| 200K–500K tokens | Premium rates | Premium rates | 1.5–2× standard |
| 500K–1M tokens | Premium rates | Premium rates | 2–3× standard |
Key insight: You pay premium rates for the entire request when context exceeds 200K, not just the tokens above the threshold.
Cost Reduction Strategies
1. Context Caching
If you are making multiple requests against the same codebase, the Anthropic API supports prompt caching:
# First request — full price, cached for subsequent requests
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=4096,
system=[{
"type": "text",
"text": large_codebase_content,
"cache_control": {"type": "ephemeral"} # Cache this content
}],
messages=[{"role": "user", "content": "Find security vulnerabilities"}]
)
# Cached content costs ~90% less on subsequent requests
2. Progressive Context Loading
Start with a summary, then load full details only for relevant sections:
# Step 1: Load file index + summaries (small context)
summary_response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"Here is a file index with summaries:\n{file_index}\n\n"
"Which files are relevant for analyzing the auth flow?"
}]
)
# Step 2: Load only relevant files (targeted context)
relevant_files = parse_file_list(summary_response)
targeted_context = load_specific_files(relevant_files)
3. The 200K Sweet Spot
For many tasks, staying under 200K tokens gives you 80% of the benefit at standard pricing:
def optimize_context(files: list, max_tokens: int = 190_000) -> str:
"""Fit the most relevant files within the standard pricing tier."""
# Sort by relevance (most relevant first)
sorted_files = sort_by_relevance(files, task_description)
context = ""
for f in sorted_files:
candidate = context + format_file(f)
if count_tokens(candidate) > max_tokens:
break
context = candidate
return context
Monthly Budget Framework
| Usage Level | Description | Estimated Monthly Cost |
|---|---|---|
| Light | 5–10 large-context queries/day | $200–500 |
| Standard | 20–30 queries/day, mixed context sizes | $500–1,500 |
| Heavy | 50+ queries/day, regular 500K+ contexts | $2,000–5,000 |
| Enterprise | Continuous pipeline, multiple agents | $5,000–15,000 |
Budget Alerts
import os
from datetime import datetime
class BudgetTracker:
def __init__(self, monthly_limit: float = 1000.0):
self.monthly_limit = monthly_limit
self.current_spend = 0.0
self.month_start = datetime.now().replace(day=1)
def track(self, input_tokens: int, output_tokens: int, context_size: int):
# Calculate cost with premium pricing if applicable
if context_size > 200_000:
multiplier = 2.0 if context_size > 500_000 else 1.5
else:
multiplier = 1.0
cost = (input_tokens * 5 + output_tokens * 25) / 1_000_000 * multiplier
self.current_spend += cost
remaining = self.monthly_limit - self.current_spend
if remaining < self.monthly_limit * 0.1:
print(f"⚠️ Budget warning: ${remaining:.2f} remaining this month")
return cost
budget = BudgetTracker(monthly_limit=1500.0)
You now understand the full picture: capabilities, structuring, and cost management for million-token contexts. In the next module, we master adaptive thinking — the system that controls how deeply the model reasons about your tasks.