Every long conversation eventually hits a wall: the context window fills up. Before compaction, you had two bad options — truncate older messages (losing information) or manually summarize (adding complexity and latency). The compaction API is a third option: let the server intelligently compress older conversation segments while preserving what matters.
How Compaction Works
When you enable compaction, the server monitors your conversation’s token usage. As it approaches a configured threshold, the API automatically summarizes older message pairs into a compact representation:
Turn 1-10: Full messages (preserved — recent)
Turn 11-30: Compacted summary (compressed — older)
Turn 31+: Discarded or archived (removed — oldest)
The compacted summary is not a naive “TL;DR.” It is a structured extraction of:
- Key decisions made during the conversation
- Code artifacts produced or modified
- User preferences expressed
- Constraints and requirements stated
- Unresolved questions still pending
API Configuration
Basic Compaction
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=4096,
system="You are a senior software architect.",
messages=conversation_history,
metadata={
"compaction": {
"enabled": True,
"trigger_tokens": 150_000, # Start compacting at 150K tokens
"preserve_recent": 10, # Always keep last 10 turns intact
}
}
)
Reading Compaction Status
# Check if compaction occurred
if hasattr(response, 'compaction_info'):
info = response.compaction_info
print(f"Compaction triggered: {info.triggered}")
print(f"Tokens before: {info.tokens_before:,}")
print(f"Tokens after: {info.tokens_after:,}")
print(f"Compression ratio: {info.compression_ratio:.1%}")
print(f"Turns compacted: {info.turns_compacted}")
What Gets Preserved vs. Compressed
The compaction algorithm is not uniform. It applies different compression levels based on content type:
| Content Type | Compression Level | Reasoning |
|---|---|---|
| Code blocks | Minimal | Code must remain exact — paraphrasing introduces bugs |
| System prompt | None | Never compressed — always fully preserved |
| Recent turns | None | Active context must remain intact |
| User requirements | Low | Requirements are referenced throughout the conversation |
| Conversational filler | High | ”Thanks!”, “Got it”, “Sounds good” add no information |
| Intermediate reasoning | Medium | Final decisions matter more than the path to them |
| Error messages | Medium | Recent errors preserved, older ones summarized |
Token Savings in Practice
Real-world compression ratios depend heavily on conversation style:
# Measured compression ratios from production workloads
COMPRESSION_BENCHMARKS = {
"support_chatbot": {
"avg_compression": 0.35, # 65% token reduction
"reason": "High conversational filler, repeated context"
},
"code_review": {
"avg_compression": 0.55, # 45% token reduction
"reason": "Code blocks resist compression"
},
"research_session": {
"avg_compression": 0.40, # 60% token reduction
"reason": "Lots of intermediate reasoning compressible"
},
"architecture_design": {
"avg_compression": 0.50, # 50% token reduction
"reason": "Decisions preserved, discussion compressed"
},
}
Compaction Triggers
You can configure when compaction fires:
# Token-based trigger (most common)
compaction_config = {
"enabled": True,
"trigger_tokens": 150_000,
"preserve_recent": 10,
}
# Turn-based trigger
compaction_config = {
"enabled": True,
"trigger_turns": 50, # Compact after 50 turns
"preserve_recent": 15,
}
# Hybrid trigger (fires on whichever hits first)
compaction_config = {
"enabled": True,
"trigger_tokens": 120_000,
"trigger_turns": 40,
"preserve_recent": 10,
}
Compaction-Aware Message Management
Build a client that manages compaction transparently:
class CompactingConversation:
"""Manages a conversation with automatic compaction."""
def __init__(self, system_prompt: str, trigger_tokens: int = 150_000):
self.client = Anthropic()
self.system_prompt = system_prompt
self.messages: list[dict] = []
self.trigger_tokens = trigger_tokens
self.compaction_events: list[dict] = []
def send(self, user_message: str) -> str:
self.messages.append({"role": "user", "content": user_message})
response = self.client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=4096,
system=self.system_prompt,
messages=self.messages,
metadata={
"compaction": {
"enabled": True,
"trigger_tokens": self.trigger_tokens,
"preserve_recent": 10,
}
}
)
assistant_text = next(
b.text for b in response.content if b.type == "text"
)
self.messages.append({"role": "assistant", "content": assistant_text})
# Track compaction events
if hasattr(response, 'compaction_info') and response.compaction_info.triggered:
self.compaction_events.append({
"turn": len(self.messages) // 2,
"tokens_saved": (
response.compaction_info.tokens_before
- response.compaction_info.tokens_after
),
"ratio": response.compaction_info.compression_ratio,
})
return assistant_text
def total_tokens_saved(self) -> int:
return sum(e["tokens_saved"] for e in self.compaction_events)
When Not to Use Compaction
Compaction is not always the right tool:
- Legal or regulatory conversations where every word matters for audit trails — log the full transcript separately before compaction
- Multi-party conversations where attribution of specific statements is critical
- Conversations under 50K tokens — compaction adds overhead without meaningful benefit
- One-shot tasks — if you do not maintain conversation state, compaction has nothing to do
In the next lesson, you will build infinite conversation workflows that leverage compaction to sustain sessions far beyond normal context limits.