Photo: Unsplash
The AI Coding Setup That Made Me 3x Faster on My MacBook
I tracked my output for three months before and after rebuilding my development environment around AI, and the honest number is this: on the right kind of work, I ship roughly three times faster. On the wrong kind of work, AI makes me slower, and I’ll show you exactly where the line is. The setup is three layers on a MacBook Pro (M3 Pro, 36 GB in my case, but everything here runs fine on an M1 Air with 16 GB): a terminal agent for heavy lifting, an AI editor for flow-state work, and a local model so the whole thing survives airplane mode.
Layer 1: A terminal agent for the heavy lifting
The biggest single change wasn’t autocomplete — it was moving multi-file work to a CLI agent. I use Claude Code:
npm install -g @anthropic-ai/claude-code
cd ~/work/my-project && claude
The workflow that changed everything: instead of opening the editor, I describe the task. “Add a DELETE /api/sessions/:id endpoint, follow the pattern in routes/users.ts, write the integration test, run the suite.” The agent reads the codebase, edits four files, runs the tests, fixes its own failures, and presents a diff. For a routine endpoint like that, what used to be 45 minutes of my attention is now 5 minutes of review.
Two configuration moves make this dramatically better. First, a CLAUDE.md at the repo root with the things you’d tell a new hire: “We use Vitest, not Jest. Run pnpm check before claiming done. Never touch migrations/.” Second, run it inside a real terminal. I switched from iTerm2 to Ghostty this year — it’s faster and its native macOS feel is excellent — but either works. The trick that matters: dedicate a window with split panes, agent on the left, lazygit on the right, so you review every diff before it ever hits a branch.
The Apple Silicon angle is real and underrated: an agent loop that runs tests on every iteration is a sustained CPU workload, and my M3 Pro handles a 40-minute agent session on the train from Prague to Brno while losing maybe 15% battery, fans silent. My old Intel MacBook would have been a leaf blower with 90 minutes of runtime. Battery life is what makes “kick off an agent, review on arrival” an actual mobile workflow rather than a desk-only party trick.
Layer 2: Cursor (or VS Code + Copilot) for flow work
Agents are wrong for code you want to think through. For that, I’m in Cursor — a VS Code fork where Cmd+K rewrites a selection from a prompt and Tab autocomplete predicts multi-line edits, including the next edit elsewhere in the file. That last part is the killer feature: rename a parameter, and Tab walks you through every call site like a smarter multi-cursor.
If you’d rather not leave VS Code, GitHub Copilot plus Copilot Chat gets you 80% of the same value. The configuration that matters either way is restraint: I disable autocomplete in Markdown and YAML (where suggestions are noise) and keep it aggressive in test files (where it’s eerily good). In Cursor that’s Settings → Features → disable per-language; in VS Code it’s "github.copilot.enable": {"markdown": false, "yaml": false} in settings.json.
My division of labor after a year: Cursor for code I’m actively designing, the terminal agent for code I’m delegating. Mixing them up is the most common mistake I see — people prompt an agent to write the subtle algorithm they should own, then hand-type the CRUD boilerplate an agent would nail.
Layer 3: The local fallback with Ollama + Continue
Cloud models disappear on planes, in tunnels, and during provider outages. The fix costs nothing:
brew install ollama
ollama pull qwen2.5-coder:7b
Then install the Continue extension in VS Code/Cursor and point its autocomplete at the local model in ~/.continue/config.yaml:
models:
- name: qwen2.5-coder
provider: ollama
model: qwen2.5-coder:7b
roles: [autocomplete, chat]
Qwen2.5-Coder 7B on my M3 Pro generates at ~35 tokens/sec — completions land fast enough that latency-wise you can’t tell it from Copilot. Quality is a tier below the frontier for chat, but for single-line and few-line autocomplete it’s genuinely close, and for “explain this regex” or “write a test for this function” while offline, it’s more than enough. On a 16 GB machine, use the same model at 7B; on 8 GB, drop to qwen2.5-coder:3b — still useful, noticeably dumber.
The honest 3x breakdown
Here’s where the multiplier actually comes from, based on my own time tracking across ~12 weeks of mixed product work:
- Boilerplate and CRUD: 4–5x. Endpoints, DTOs, config plumbing, API clients from an OpenAPI spec. The agent is close to perfect here.
- Tests: 3–4x. Generating table-driven tests and edge cases from existing code is the single highest-ROI prompt in my arsenal. Quality went up, not just speed — the model thinks of edge cases I’m too bored to.
- Refactoring: ~3x. Mechanical-but-tedious changes (extract service, swap a library, rename across 60 files) with the test suite as a safety net.
- Unfamiliar APIs: 3x. Working with AVFoundation or a gnarly AWS SDK, AI replaces an hour of doc-spelunking with two minutes of asking. Verify against the real docs; it occasionally hallucinates method names.
- Novel architecture: ~1x, sometimes 0.7x. When designing something genuinely new, AI suggestions anchor me to the median solution. I now design with the autocomplete toggled off (Cursor: Cmd+Shift+P → “Disable autocomplete”).
- Debugging subtle state: 0.5x if I’m not careful. A race condition or a stale-cache bug needs a hypothesis, and a model will confidently hand you three wrong ones. The discipline: AI writes the reproduction script and the instrumentation, but I do the reasoning.
Average it over a real week’s mix of work and you land near 3x on shipped output. Not 10x — anyone claiming 10x is either doing pure boilerplate or not reviewing the diffs.
Making it stick: the glue
Three small macOS-specific habits complete the setup. First, a global hotkey terminal (Ghostty’s Quick Terminal, or iTerm2’s hotkey window) so the agent is one keystroke away from any app. Second, a shell function I use a dozen times a day to pipe context at a model:
aiq() { cat "$2" | claude -p "$1"; }
# usage: aiq "why does this test flake?" src/sync.test.ts
Third — and this is the one people skip — a review ritual. Every agent diff gets read in lazygit hunk by hunk before commit. The 3x number only holds if quality holds; the moment you start rubber-stamping diffs, you’re borrowing speed from future debugging sessions at a terrible interest rate.
The whole stack costs about $40/month (one frontier API/subscription plus Cursor), runs on any Apple Silicon Mac, and degrades gracefully to free-and-local when the network dies. Start with one layer — the terminal agent, on your most boilerplate-heavy repo — measure a week, and add the next layer only when the first one’s paying rent.

