AI Power User

The Mac Terminal Commands That Unlock Hidden AI Power

The CLI tricks that turn Ollama and friends from a chat toy into a Unix-grade workhorse

By Jakub Jirák Jul 1, 2026 4 min read

Most people who install Ollama on a Mac learn exactly one command — ollama run llama3.1 — and stop there. That’s like buying a Mac Studio and only ever using Safari. The real leverage of local AI on macOS lives in the terminal, where models become composable Unix citizens: things you pipe into, alias, schedule, and chain. After eighteen months of running local models daily on an M2 Max, these are the commands that actually changed how I work.

Ollama beyond `run`

Start with the commands nobody reads the docs for. ollama ps shows what’s currently loaded into memory and — crucially — how long until it unloads:

$ ollama ps
NAME            ID            SIZE     PROCESSOR    UNTIL
qwen2.5:14b     7cdf5a0187d5  9.6 GB   100% GPU     4 minutes from now

That UNTIL column matters because model loading is the slowest part of local AI. A 14B model takes 6–8 seconds to load from SSD on my machine; once resident, first token arrives in under half a second. The default keep-alive is five minutes, which is wrong for almost everyone. Fix it with an environment variable:

# Keep models in memory for 2 hours
launchctl setenv OLLAMA_KEEP_ALIVE 2h
# Or forever, if you have RAM to spare
launchctl setenv OLLAMA_KEEP_ALIVE -1

While you’re there, OLLAMA_NUM_PARALLEL=4 lets the server handle four simultaneous requests — essential if you have scripts and an editor plugin hitting the same model. Restart the Ollama app after setting these.

ollama show qwen2.5:14b --modelfile prints the full recipe of any model: its template, parameters, system prompt. And ollama create lets you build your own variants. I keep a Modelfile for a Czech↔English translator that I use a dozen times a day:

FROM qwen2.5:14b
PARAMETER temperature 0.3
SYSTEM You are a precise Czech-English translator. Output only the translation, no commentary. Preserve tone and register.

ollama create translator -f ./Modelfile

Now ollama run translator "Dej mi vědět, až to bude hotové" returns a clean translation with no chatty preamble. Custom models cost nothing — they’re just metadata over the same weights.

Pipe anything into a model

The single most underrated fact about ollama run is that it reads stdin. That makes every file, command output, and clipboard on your Mac a valid prompt:

# Summarize a long README
cat README.md | ollama run llama3.1 "Summarize in 5 bullet points"

# Explain a failing build
swift build 2>&1 | ollama run qwen2.5:14b "Explain this error and suggest a fix"

# What changed in this branch, in English
git diff main | ollama run llama3.1 "Write a one-paragraph PR description"

On macOS this gets a superpower other platforms don’t have: pbpaste and pbcopy. The clipboard becomes an AI pipeline. My most-used one-liner of 2026, no contest:

pbpaste | ollama run translator | pbcopy

Copy Czech text anywhere, run the command, paste English. Round trip on a 14B model: about three seconds. Other variants I run weekly:

# Fix grammar in whatever I just copied
pbpaste | ollama run llama3.1 "Fix grammar and spelling only. Output the corrected text." | pbcopy

# Turn a rambling Slack message into something professional
pbpaste | ollama run llama3.1 "Rewrite this concisely and professionally" | pbcopy

No app, no subscription, no text leaving the machine.

The llm CLI — templates, logs, and plugins

Simon Willison’s llm tool is the missing standard library for command-line AI. Install it with brew install llm, point it at Ollama with the llm-ollama plugin (llm install llm-ollama), and you get three things Ollama alone doesn’t give you.

First, templates — saved prompts with parameters:

llm --model qwen2.5:14b --system "Write a conventional commit message for this diff. One line, max 72 chars." --save commitmsg
git diff --staged | llm -t commitmsg

Second, logging. Every prompt and response is stored in a local SQLite database. llm logs -n 3 shows your last three interactions; llm logs --search "kubernetes" finds that explanation from two weeks ago you forgot to save. This alone justifies the install — your AI history becomes greppable.

Third, model agility. The same command works against Ollama, OpenAI, Anthropic, or Gemini by swapping -m. I run drafts locally and only escalate to a cloud model when the local answer disappoints:

cat spec.md | llm -m qwen2.5:14b "Find logical inconsistencies"
cat spec.md | llm -m claude-4.5-sonnet "Find logical inconsistencies"  # second opinion

jq — making API output usable

If you script against Ollama’s HTTP API directly (localhost:11434), responses come as JSON, and jq is how you stay sane. Get just the text out of a generate call:

curl -s http://localhost:11434/api/generate \
  -d '{"model":"llama3.1","prompt":"Name three macOS automation tools","stream":false}' \
  | jq -r '.response'

More usefully, jq lets you audit your local model collection. Which models are eating your SSD?

curl -s http://localhost:11434/api/tags | jq -r '.models[] | "\(.size/1e9 | floor)GB\t\(.name)"' | sort -rn

And when a model emits structured output (ask for JSON in the prompt, or use Ollama’s format: json parameter), jq turns it into something a script can act on — extracting .severity from a code-review response, or .tags[] from an auto-tagging prompt. That’s the difference between AI as a chat partner and AI as a build step.

Aliases — make the good path the lazy path

The commands above only change your life if they cost zero keystrokes of thinking. That’s what shell functions are for. The pattern: every AI task you do more than twice a week becomes a verb.

Here’s my snippet collection — paste into ~/.zshrc, adjust models to your RAM tier, and reload with source ~/.zshrc:

# --- AI toolkit ---------------------------------------------
# Translate clipboard (Czech <-> English), result back to clipboard
alias tr-clip='pbpaste | ollama run translator | pbcopy && echo "translated ✓"'

# Fix grammar in clipboard
alias fix-clip='pbpaste | ollama run llama3.1 "Fix grammar and spelling only. Output corrected text, nothing else." | pbcopy'

# Summarize any file: aisum notes.md
aisum() { cat "$1" | ollama run qwen2.5:14b "Summarize in 5 concise bullet points"; }

# Explain the last command's failure: rerun with aiwhy <command>
aiwhy() { "$@" 2>&1 | ollama run qwen2.5:14b "This command failed. Explain why and suggest a fix:"; }

# Commit message from staged changes
aicommit() { git diff --staged | llm -t commitmsg; }

# Ask a quick question without leaving the prompt: ai "how do I..."
ai() { ollama run llama3.1 "$*"; }

# What's loaded right now
alias aips='ollama ps'
# ------------------------------------------------------------

Two honest caveats. Local models in the 8–14B class will occasionally produce a wrong jq filter or a confidently broken fix — treat aiwhy output as a hypothesis, not a verdict. And keep an eye on RAM: with OLLAMA_KEEP_ALIVE=-1 and a 14B model resident, you’re permanently donating ~10 GB. On a 64 GB Mac Studio that’s free; on a 16 GB Air it’s a tax.

But the core shift is real. Once models live in your shell — pipeable, aliasable, loggable — you stop “going to the AI” and the AI starts sitting inside the workflows you already have. That’s the hidden power: not a smarter model, just shorter distance between a thought and its execution.

The Mac Terminal Commands That Unlock Hidden AI Power

Ollama beyond run

Apple Intelligence vs. Local LLMs — What Apple Doesn't Tell You

Pipe anything into a model

The llm CLI — templates, logs, and plugins

jq — making API output usable

Local AI Translation Beats Google Translate — and It's Private

Aliases — make the good path the lazy path

Ollama beyond `run`