Lesson 34 of 46 ~30 min
Course progress
0%

Full Codebase Analysis & Cross-File Refactoring

Use the 1M context window to analyze entire codebases — find architectural issues, trace dependencies, and execute cross-file refactoring.

This lesson takes the structuring patterns from the previous lesson and applies them to real codebase analysis. You will learn to load a complete project into context and extract insights that are impossible with partial views.

Loading a Complete Codebase

import os
from pathlib import Path
from anthropic import Anthropic

client = Anthropic()

def load_codebase(root: str, exclude_patterns: list[str] = None) -> str:
    """Load an entire codebase into a context-ready string."""
    exclude = exclude_patterns or [
        "node_modules", "dist", "__pycache__", ".git",
        "*.lock", "*.min.js", "*.map"
    ]

    files = []
    root_path = Path(root)

    for filepath in sorted(root_path.rglob("*")):
        if not filepath.is_file():
            continue
        if any(filepath.match(p) for p in exclude):
            continue
        try:
            content = filepath.read_text(encoding="utf-8")
            rel_path = filepath.relative_to(root_path)
            files.append((str(rel_path), content))
        except (UnicodeDecodeError, PermissionError):
            continue

    # Build structured context
    doc_map = "## File Index\n\n"
    for i, (path, _) in enumerate(files, 1):
        doc_map += f"{i}. `{path}`\n"

    doc_map += f"\n**Total: {len(files)} files**\n\n---\n\n"

    full_content = doc_map
    for path, content in files:
        ext = Path(path).suffix.lstrip(".")
        full_content += f"### `{path}`\n```{ext}\n{content}\n```\n\n"

    return full_content

Architecture Analysis

With the full codebase loaded, ask architectural questions that require cross-file understanding:

codebase = load_codebase("./my-project")

response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=8192,
    thinking={"type": "adaptive", "effort": "deep"},
    system="""You are a principal architect reviewing this codebase for the
    first time. Your analysis should be actionable and prioritized.""",
    messages=[{
        "role": "user",
        "content": f"""{codebase}

Analyze this codebase and provide:
1. Architecture diagram (as Mermaid)
2. Top 5 architectural risks (with specific file references)
3. Dependency graph of the most critical modules
4. Recommended refactoring priorities (ordered by risk × effort)
"""
    }]
)

Cross-File Dependency Tracing

One of the most powerful uses of full-codebase context — tracing how a change propagates:

response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=8192,
    thinking={"type": "adaptive", "effort": "maximum"},
    messages=[{
        "role": "user",
        "content": f"""{codebase}

I need to change the `User.email` field from a string to an Email
value object. Trace every file that would be affected by this change.

For each affected file, provide:
- The file path
- The specific lines that need to change
- The exact code change required
- Any tests that need updating

Order by: files that must change first (dependency order).
"""
    }]
)

Cost Awareness for Large Context

Processing a full codebase is expensive. Here is what to expect:

Example: 300K token codebase + 8K output

Standard pricing ($5/$25 per 1M):
  Input:  300K × $5  / 1M = $1.50
  Output: 8K  × $25 / 1M = $0.20
  Total: $1.70 per analysis

Premium pricing (>200K context — higher rates):
  Total: ~$3-5 per analysis (depending on premium tier)

Monthly projection (5 analyses/day × 22 days):
  Standard: $187/month
  Premium:  $330-550/month

Cost optimization tip: For iterative work, load the codebase once, then use multi-turn conversation for follow-up questions — you pay for input tokens only on the first message.

Practical Exercise

Load a codebase you work with daily into Opus 4.6 and ask it to:

  1. Identify the three highest-risk modules and explain why
  2. Find any circular dependencies
  3. Suggest a refactoring that would reduce coupling between the two most tightly connected modules

Compare the results with your own understanding. You will likely learn something new about your own codebase.

In the next lesson, we cover the cost implications and premium pricing for large-context usage.