Why Every Writer Needs a Local LLM (and How to Set One Up)

Photo: Unsplash

AI Power User

Why Every Writer Needs a Local LLM (and How to Set One Up)

A tireless first reader that never sees your unpublished manuscript leave your Mac

A novelist friend asked me last month why she should care about any of this local AI stuff. She doesn’t code. She doesn’t want a homelab. She writes 2,000 words a day in Scrivener and her relationship with technology is, in her words, “strictly transactional.”

I gave her a one-sentence answer: a local LLM is a first reader who works at 2 a.m., never gets tired of your manuscript, and physically cannot leak it. Two weeks later she texted me that she’d caught a continuity error in chapter 14 — a character drinking coffee who’d been established as caffeine-intolerant in chapter 3 — because she asked a model running on her MacBook Air to “list every fact established about Marta.”

That’s the pitch. Here’s the full case, and the setup, written for writers rather than tinkerers.

What a local model actually does for a writer

Forget “AI writes your novel.” It doesn’t, and you don’t want it to. The useful jobs are everything around the writing:

Tireless first-reader feedback. Paste a chapter and ask: “Where does the pacing drag? Where were you confused? What did you predict would happen next?” A model gives you the reaction of an attentive-but-average reader instantly, at draft three, when no human in your life wants to read draft three.

Rewriting for tone. “Make this paragraph colder and more clinical.” “Same information, but the narrator is amused rather than angry.” You won’t keep the model’s version — but seeing your paragraph re-pitched in three registers is the fastest way to find the one you actually meant.

Untangling structure. Paste your messy outline and ask the model to identify which scenes carry no story function, or to propose two alternative chapter orders. It’s a thinking partner for the whiteboard stage.

Research summarization. Drop in a 9,000-word historical article and ask for “every detail relevant to daily life in 1920s Brno.” The model condenses; you verify the details you actually use.

Consistency checking for fiction. This is the sleeper feature. Feed it your chapters and ask it to build a fact sheet per character — eye color, backstory dates, established preferences — then check new chapters against it. Continuity editors charge real money for this.

Why local matters specifically for writers

For most people, local AI is about preference. For working writers, the argument is sharper, and it has three legs.

Your unpublished manuscript never leaves your machine. When you paste chapter drafts into a cloud chatbot, your unsold, uncopyrighted-in-practice work transits someone else’s servers. Most professional writers’ organizations now explicitly warn members about this. With a local model, the manuscript goes from your SSD to your RAM and back. There is no third party, full stop.

No terms-of-service ambiguity. Cloud AI providers have varying, periodically-updated policies about whether your inputs train future models. Some let you opt out; some opt you out by default on paid tiers; the details change. You should not need to re-read a data policy every quarter to know whether your novel is becoming training data. A model running on your own hardware makes the question meaningless.

It works on a plane. Less dramatic, weirdly important. Writers work in cabins, on trains, in cafés with hostile Wi-Fi. The local model doesn’t care. My entire setup runs in airplane mode, and some of my best editing sessions have happened at 11,000 meters precisely because nothing else worked.

The setup, for people who don’t want a setup

Two pieces: an engine and a face.

The engine is Ollama. Download the Mac app from the website — no terminal required — and it installs like any other application, with a little llama in your menu bar. If you don’t mind one terminal command, this also works:

brew install ollama

The face is a chat app. Ollama’s own app now includes a basic chat window, which is honestly enough to start. If you want something nicer, native Mac apps like Enchanted (free, App Store) or Msty give you a polished ChatGPT-style interface that talks to Ollama automatically. The popular Open WebUI is more powerful — it adds document upload and saved prompt libraries — but it runs via Docker, which I’d only recommend if you have a technical friend on call.

Which model writes well? Size matters less than you’d think for editing tasks, but it matters. My recommendations by Mac RAM:

  • 8GB Macs: llama3.2:3b — fine for summaries and quick tone passes, will miss subtleties.
  • 16GB Macs: gemma3:12b or llama3.1:8b — the sweet spot for most writers. Genuinely good feedback.
  • 32GB and up: gemma3:27b or qwen3:32b — noticeably better prose sense and longer attention span for full chapters.

Pull one with ollama pull gemma3:12b or through your chat app’s model menu. One more thing worth doing: raise the context window, because chapters are long. In Ollama’s settings (or via OLLAMA_CONTEXT_LENGTH=16384), set at least 16k tokens — roughly 12,000 words of text the model can hold in mind at once.

Two editors, ready to hire

The trick that turns a generic chatbot into a useful colleague is the system prompt — standing instructions that shape every response. Every app above lets you save these as personas. Here are the two I actually use, in full. Steal them.

The developmental editor:

You are a developmental editor with 20 years of experience in literary
and commercial fiction. When given a chapter or outline, you assess
structure, pacing, stakes, and character motivation — never grammar or
word choice. Always answer in this order: (1) what is working and why,
(2) the single biggest structural problem, (3) two or three specific,
practical suggestions framed as questions for the author to consider.
Be direct and concrete. Quote the text when making a point. Never
rewrite the author's prose. Never praise to soften criticism.

The line editor:

You are a meticulous line editor. You care about rhythm, clarity,
precision, and cutting dead weight. When given prose, return a list of
specific line-level issues: flabby phrases, repeated words, unclear
antecedents, clichés, filter words (saw, felt, noticed), and sentences
whose rhythm stumbles. For each issue, quote the original and offer one
tightened alternative. Preserve the author's voice — your job is to
remove friction, not impose style. Do not comment on plot or structure.
End with the three highest-impact fixes.

Notice they’re deliberately narrow. A model told to “edit this” does everything badly at once. A model told to be exactly one kind of editor is startlingly useful — and you can run your chapter past both, the way a publisher would, for the price of two minutes.

What local models still can’t do for you

Honesty section. A 12B model on your MacBook is not a senior editor at a publishing house, and pretending otherwise will hurt your book.

It can’t tell you whether the book is good. Models are agreeable by nature and calibrated against average text. They’ll flag confusion and slack pacing reliably; they cannot judge whether your premise is fresh or your voice is distinctive. That verdict still requires humans who read in your genre.

Its taste ceiling is real. Local models in the 8–30B range occasionally praise clunky sentences and suggest “fixes” that sand off exactly the strangeness that makes your style yours. Treat every suggestion as a question, not a verdict.

Long-manuscript memory is limited. Even at 16k context, a model holds one or two chapters, not your 95,000-word novel. Consistency checking works chapter-by-chapter against a fact sheet you maintain — not as a one-shot “read my book.”

It can’t replace the market. No model knows what acquiring editors want in 2026.

What it can be is the colleague every writer wishes they had: available at every hour, immune to boredom, incapable of gossip, and resident entirely on the laptop that already holds your manuscript. My novelist friend’s verdict after a month: “It’s like having a writing group that never cancels.” She still doesn’t care how it works. That’s the point — she doesn’t have to.