This lesson takes the structuring patterns from the previous lesson and applies them to real codebase analysis. You will learn to load a complete project into context and extract insights that are impossible with partial views.
Loading a Complete Codebase
import os
from pathlib import Path
from anthropic import Anthropic
client = Anthropic()
def load_codebase(root: str, exclude_patterns: list[str] = None) -> str:
"""Load an entire codebase into a context-ready string."""
exclude = exclude_patterns or [
"node_modules", "dist", "__pycache__", ".git",
"*.lock", "*.min.js", "*.map"
]
files = []
root_path = Path(root)
for filepath in sorted(root_path.rglob("*")):
if not filepath.is_file():
continue
if any(filepath.match(p) for p in exclude):
continue
try:
content = filepath.read_text(encoding="utf-8")
rel_path = filepath.relative_to(root_path)
files.append((str(rel_path), content))
except (UnicodeDecodeError, PermissionError):
continue
# Build structured context
doc_map = "## File Index\n\n"
for i, (path, _) in enumerate(files, 1):
doc_map += f"{i}. `{path}`\n"
doc_map += f"\n**Total: {len(files)} files**\n\n---\n\n"
full_content = doc_map
for path, content in files:
ext = Path(path).suffix.lstrip(".")
full_content += f"### `{path}`\n```{ext}\n{content}\n```\n\n"
return full_content
Architecture Analysis
With the full codebase loaded, ask architectural questions that require cross-file understanding:
codebase = load_codebase("./my-project")
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=8192,
thinking={"type": "adaptive", "effort": "deep"},
system="""You are a principal architect reviewing this codebase for the
first time. Your analysis should be actionable and prioritized.""",
messages=[{
"role": "user",
"content": f"""{codebase}
Analyze this codebase and provide:
1. Architecture diagram (as Mermaid)
2. Top 5 architectural risks (with specific file references)
3. Dependency graph of the most critical modules
4. Recommended refactoring priorities (ordered by risk × effort)
"""
}]
)
Cross-File Dependency Tracing
One of the most powerful uses of full-codebase context — tracing how a change propagates:
response = client.messages.create(
model="claude-opus-4-6-20260205",
max_tokens=8192,
thinking={"type": "adaptive", "effort": "maximum"},
messages=[{
"role": "user",
"content": f"""{codebase}
I need to change the `User.email` field from a string to an Email
value object. Trace every file that would be affected by this change.
For each affected file, provide:
- The file path
- The specific lines that need to change
- The exact code change required
- Any tests that need updating
Order by: files that must change first (dependency order).
"""
}]
)
Cost Awareness for Large Context
Processing a full codebase is expensive. Here is what to expect:
Example: 300K token codebase + 8K output
Standard pricing ($5/$25 per 1M):
Input: 300K × $5 / 1M = $1.50
Output: 8K × $25 / 1M = $0.20
Total: $1.70 per analysis
Premium pricing (>200K context — higher rates):
Total: ~$3-5 per analysis (depending on premium tier)
Monthly projection (5 analyses/day × 22 days):
Standard: $187/month
Premium: $330-550/month
Cost optimization tip: For iterative work, load the codebase once, then use multi-turn conversation for follow-up questions — you pay for input tokens only on the first message.
Practical Exercise
Load a codebase you work with daily into Opus 4.6 and ask it to:
- Identify the three highest-risk modules and explain why
- Find any circular dependencies
- Suggest a refactoring that would reduce coupling between the two most tightly connected modules
Compare the results with your own understanding. You will likely learn something new about your own codebase.
In the next lesson, we cover the cost implications and premium pricing for large-context usage.