Lesson 10 of 46 ~25 min
Course progress
0%

Understanding the 1M Token Context Window

Learn how the million-token context window works, its practical limits, and why it fundamentally changes your workflow.

The jump from 200K to 1M tokens is not 5× more context — it is a qualitative shift that enables entirely new categories of work. Instead of carefully selecting which parts of a codebase to include, you can load the entire thing. Instead of summarizing a legal case, you can process it in full.

What 1M Tokens Actually Means

1,000,000 tokens ≈ 750,000 words ≈ 1,500 pages of text

Practical equivalents:
- A complete novel (75K-100K words) × 7-10
- An entire medium-sized codebase (500+ files)
- A full legal case file with all depositions
- 100+ research papers
- An entire documentation set (e.g., React docs + Next.js docs + Node.js docs)

How It Works Internally

The 1M context window uses a hierarchical attention system that processes information at three levels:

graph TB
    A[1M Token Input] --> B[Local Attention Layer<br/>High precision on nearby tokens]
    A --> C[Sparse Attention Layer<br/>Efficient scanning of distant context]
    A --> D[Learned Retrieval Layer<br/>Finding relevant sections on demand]
    B --> E[Final Output<br/>Coherent reasoning across full context]
    C --> E
    D --> E

Local attention handles nearby tokens with full precision — the same quality you expect from standard context.

Sparse attention efficiently scans the full context for relevant information without processing every token with full attention weights.

Learned retrieval activates when the model needs specific information from distant parts of the context — like finding a function definition 400K tokens back.

Practical Limits

The 1M window is in beta. Be aware of these real-world constraints:

AspectLimitPractical Impact
Input tokens1MEntire codebases, full corpora
Output tokens8K-32KResponses are still bounded
Processing time30-120+ secondsMuch slower than standard context
Premium pricingHigher rates >200KCost increases significantly
Retrieval accuracy~95% at 500K, ~90% at 1MSlight degradation at extreme lengths

Context Rot: Mostly Solved, Not Eliminated

Previous models suffered from “context rot” — the model would gradually lose awareness of information placed early in a long context. Opus 4.6 dramatically reduces this but does not eliminate it entirely.

Best practices to minimize context rot:

  1. Place the most critical information at the beginning and end of your context
  2. Use clear section headers and markers for important content
  3. Periodically reference earlier context in your prompts
  4. For contexts >500K tokens, add a brief summary at the start
# Structure for large-context prompts
messages = [{
    "role": "user",
    "content": f"""
## Context Summary
This context contains the full codebase for Project X (487 files, 312K lines).
Key areas: src/auth/ (authentication), src/api/ (endpoints), src/db/ (database).
The bug is likely in the authentication flow.

## Full Codebase
{full_codebase_content}

## Task
Find the race condition causing intermittent login failures.
Focus on src/auth/ but check all callers across the codebase.
"""
}]

In the next lesson, you will learn how to structure documents and code for optimal processing within large contexts.