Lesson 5 of 46 ~25 min
Course progress
0%

Compaction Fundamentals

Understand how the compaction API works — server-side summarization mechanics, what gets preserved vs. compressed, token savings, and API configuration.

Every long conversation eventually hits a wall: the context window fills up. Before compaction, you had two bad options — truncate older messages (losing information) or manually summarize (adding complexity and latency). The compaction API is a third option: let the server intelligently compress older conversation segments while preserving what matters.

How Compaction Works

When you enable compaction, the server monitors your conversation’s token usage. As it approaches a configured threshold, the API automatically summarizes older message pairs into a compact representation:

Turn 1-10: Full messages          (preserved — recent)
Turn 11-30: Compacted summary     (compressed — older)
Turn 31+: Discarded or archived   (removed — oldest)

The compacted summary is not a naive “TL;DR.” It is a structured extraction of:

  • Key decisions made during the conversation
  • Code artifacts produced or modified
  • User preferences expressed
  • Constraints and requirements stated
  • Unresolved questions still pending

API Configuration

Basic Compaction

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=4096,
    system="You are a senior software architect.",
    messages=conversation_history,
    metadata={
        "compaction": {
            "enabled": True,
            "trigger_tokens": 150_000,    # Start compacting at 150K tokens
            "preserve_recent": 10,         # Always keep last 10 turns intact
        }
    }
)

Reading Compaction Status

# Check if compaction occurred
if hasattr(response, 'compaction_info'):
    info = response.compaction_info
    print(f"Compaction triggered: {info.triggered}")
    print(f"Tokens before: {info.tokens_before:,}")
    print(f"Tokens after: {info.tokens_after:,}")
    print(f"Compression ratio: {info.compression_ratio:.1%}")
    print(f"Turns compacted: {info.turns_compacted}")

What Gets Preserved vs. Compressed

The compaction algorithm is not uniform. It applies different compression levels based on content type:

Content TypeCompression LevelReasoning
Code blocksMinimalCode must remain exact — paraphrasing introduces bugs
System promptNoneNever compressed — always fully preserved
Recent turnsNoneActive context must remain intact
User requirementsLowRequirements are referenced throughout the conversation
Conversational fillerHigh”Thanks!”, “Got it”, “Sounds good” add no information
Intermediate reasoningMediumFinal decisions matter more than the path to them
Error messagesMediumRecent errors preserved, older ones summarized

Token Savings in Practice

Real-world compression ratios depend heavily on conversation style:

# Measured compression ratios from production workloads
COMPRESSION_BENCHMARKS = {
    "support_chatbot": {
        "avg_compression": 0.35,    # 65% token reduction
        "reason": "High conversational filler, repeated context"
    },
    "code_review": {
        "avg_compression": 0.55,    # 45% token reduction
        "reason": "Code blocks resist compression"
    },
    "research_session": {
        "avg_compression": 0.40,    # 60% token reduction
        "reason": "Lots of intermediate reasoning compressible"
    },
    "architecture_design": {
        "avg_compression": 0.50,    # 50% token reduction
        "reason": "Decisions preserved, discussion compressed"
    },
}

Compaction Triggers

You can configure when compaction fires:

# Token-based trigger (most common)
compaction_config = {
    "enabled": True,
    "trigger_tokens": 150_000,
    "preserve_recent": 10,
}

# Turn-based trigger
compaction_config = {
    "enabled": True,
    "trigger_turns": 50,          # Compact after 50 turns
    "preserve_recent": 15,
}

# Hybrid trigger (fires on whichever hits first)
compaction_config = {
    "enabled": True,
    "trigger_tokens": 120_000,
    "trigger_turns": 40,
    "preserve_recent": 10,
}

Compaction-Aware Message Management

Build a client that manages compaction transparently:

class CompactingConversation:
    """Manages a conversation with automatic compaction."""

    def __init__(self, system_prompt: str, trigger_tokens: int = 150_000):
        self.client = Anthropic()
        self.system_prompt = system_prompt
        self.messages: list[dict] = []
        self.trigger_tokens = trigger_tokens
        self.compaction_events: list[dict] = []

    def send(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})

        response = self.client.messages.create(
            model="claude-opus-4-6-20260205",
            max_tokens=4096,
            system=self.system_prompt,
            messages=self.messages,
            metadata={
                "compaction": {
                    "enabled": True,
                    "trigger_tokens": self.trigger_tokens,
                    "preserve_recent": 10,
                }
            }
        )

        assistant_text = next(
            b.text for b in response.content if b.type == "text"
        )
        self.messages.append({"role": "assistant", "content": assistant_text})

        # Track compaction events
        if hasattr(response, 'compaction_info') and response.compaction_info.triggered:
            self.compaction_events.append({
                "turn": len(self.messages) // 2,
                "tokens_saved": (
                    response.compaction_info.tokens_before
                    - response.compaction_info.tokens_after
                ),
                "ratio": response.compaction_info.compression_ratio,
            })

        return assistant_text

    def total_tokens_saved(self) -> int:
        return sum(e["tokens_saved"] for e in self.compaction_events)

When Not to Use Compaction

Compaction is not always the right tool:

  • Legal or regulatory conversations where every word matters for audit trails — log the full transcript separately before compaction
  • Multi-party conversations where attribution of specific statements is critical
  • Conversations under 50K tokens — compaction adds overhead without meaningful benefit
  • One-shot tasks — if you do not maintain conversation state, compaction has nothing to do

In the next lesson, you will build infinite conversation workflows that leverage compaction to sustain sessions far beyond normal context limits.