I Tracked Every AI Query for a Month — The Results Changed My Setup

Photo: Unsplash

AI Power User

I Tracked Every AI Query for a Month — The Results Changed My Setup

Thirty days of logging every prompt revealed that most of my AI usage never needed the cloud at all

For the entire month of May, I logged every single AI query I made. Every quick lookup, every code question, every “rewrite this email” request. All of it. 847 queries across 31 days, captured in a SQLite database on my Mac Studio.

I expected the data to confirm what I assumed: that I needed frontier cloud models for most of my work. Instead, it told me I’d been paying for a Ferrari to drive to the corner shop. By the end of this post you’ll see the exact numbers, the methodology you can replicate this afternoon, and the setup change that cut my monthly AI spend by roughly 85%.

How I Actually Logged Everything

Three capture methods, because my AI usage is fragmented across interfaces:

Open WebUI was the easy part. It already stores every conversation in its database. I run it in Docker on the Mac Studio, pointed at both Ollama and OpenRouter, so anything typed there was logged automatically. A quick export at month’s end gave me timestamps, model used, and full prompts.

The llm CLI by Simon Willison logs to SQLite out of the box. Every terminal query lands in ~/Library/Application Support/io.datasette.llm/logs.db. If you use the terminal for AI at all, this is the lowest-friction logging you’ll ever set up:

brew install llm
llm "explain this zsh glob: **/*.{ts,tsx}"
llm logs -n 5  # see your recent queries

App-based queries (ChatGPT’s Mac app, Claude in the browser, Raycast AI) were the annoying part. No clean export, so I kept a manual tally with a Shortcuts widget — one tap per query, logging category and timestamp to a Numbers sheet. Crude, but after two days it became reflexive.

To make replication trivial, here’s the tiny script I used to merge everything into one dataset:

#!/bin/zsh
# merge-ai-logs.sh — dump llm CLI logs to CSV for analysis
sqlite3 -csv "$HOME/Library/Application Support/io.datasette.llm/logs.db" \
  "SELECT datetime(datetime_utc), model, length(prompt), length(response)
   FROM responses ORDER BY datetime_utc;" > ~/ai-log-cli.csv
echo "Exported $(wc -l < ~/ai-log-cli.csv) queries"

Combine that with the Open WebUI export and your manual tally, and you have a complete picture. Total setup time: under an hour.

What 847 Queries Actually Look Like

I categorized every query by hand over a weekend (with a local model pre-sorting and me verifying — meta, I know). The distribution genuinely surprised me:

  • Quick lookups: 40% (339 queries). “What’s the flag for recursive scp?” “Convert 14:30 UTC to Prague time.” “What does ECONNRESET mean?” Things I’d have Googled in 2021.
  • Writing assistance: 25% (212 queries). Email polishing, rewording paragraphs, English idiom checks (I’m Czech — I sanity-check phrasing constantly).
  • Code: 20% (169 queries). Snippets, regex, explaining unfamiliar code, small refactors.
  • Summarization: 10% (85 queries). Articles, PDFs, meeting transcripts.
  • Complex reasoning: 5% (42 queries). Architecture decisions, long-document analysis, multi-step planning, anything where being wrong actually costs me something.

Then I did the brutal exercise: I re-ran a random sample of 120 queries from the first four categories through Qwen 2.5 14B running locally via Ollama, and blind-compared answers against the cloud originals. For quick lookups and writing assistance, the local model was indistinguishable or better in 91% of cases. For code, about 78%. Summarization, nearly 100% — local models are genuinely great at this.

Roll it up and the conclusion was unavoidable: over 80% of my queries were fully answerable by a 14B model running on my own hardware. Only that top 5% sliver — plus the hard fifth of code questions — genuinely benefited from GPT-4-class or Claude-class reasoning.

The Latency Finding Nobody Talks About

Here’s the part that changed my behavior more than the cost: for short queries, local is faster than cloud. Not “acceptably slower.” Faster.

I measured round-trip time on 50 short prompts (under 100 tokens in, under 200 out):

  • Local (Qwen 2.5 14B, M2 Ultra, Ollama): first token in ~180ms, full response in 1.8–2.5 seconds.
  • Cloud (frontier model via API): first token in 600ms–1.4s depending on time of day, full response in 3–7 seconds. Add browser-app overhead and it’s worse.

For a 15-word answer to “what’s the kubectl command to tail logs,” the network round trip and queueing dominate. The Mac Studio pushes ~45 tokens/second on a 14B model, and there’s no queue, no auth handshake, no loading spinner. Over 339 quick lookups a month, that latency difference is real, felt time.

The Setup Change: Default Local, Deliberate Escalation

The data dictated the new architecture, and it’s embarrassingly simple: everything goes local first, and escalating to a cloud model is a deliberate, explicit act.

Concretely:

  1. Raycast AI and my terminal llm default now point at Ollama (llm models default qwen2.5:14b after installing the llm-ollama plugin). The fast path is local.
  2. Open WebUI has the local model pinned as default, with cloud models still in the dropdown for when I consciously choose them.
  3. Escalation rule: if the local answer is wrong, hedgy, or the task is in my “complex reasoning” category, I re-ask the cloud model. The keystroke cost of escalating is about three seconds. In practice I escalate roughly 1 in 8 queries — almost exactly what the data predicted.

The psychological shift matters more than the plumbing. Before, the cloud was my default and local was the experiment. Now local is the default and the cloud is a specialist I consult on purpose.

The Money, Honestly Accounted

Before the experiment: ChatGPT Plus ($20), Claude Pro ($20), plus around $25/month in API usage across OpenRouter and direct keys. Call it $65/month.

After: I dropped both subscriptions, kept pay-as-you-go API access for the escalation path, and my June API bill came to $9.40. That’s an 85% drop, about $660/year, for a workflow that is — per my own blind comparison — better for most queries because it’s faster and never rate-limited.

Yes, the Mac Studio wasn’t free. But I owned it already, and a used M1 Max Studio with 64GB runs the same 14B model at very usable speeds for under €1,500 — the payback math works if you’re a heavy user, and the privacy of never sending drafts and code off-machine doesn’t show up on any invoice.

Run This Experiment Yourself

You don’t need my exact stack. The minimum viable version:

brew install llm ollama
ollama pull qwen2.5:14b
llm install llm-ollama
llm -m qwen2.5:14b "test query"  # everything auto-logs to SQLite

Then commit to 30 days of routing your queries through loggable interfaces, tally the app-based stragglers manually, and categorize at the end of the month. The categories that matter: could a local model have answered this, yes or no?

My prediction, based on my data and on every reader who’s emailed me after trying similar audits: you’ll find your distribution is closer to mine than you think. The frontier models are extraordinary, and the top 5% of my queries genuinely need them. But paying frontier prices — in money and in latency — for “what’s the flag for recursive scp” never made sense. I just needed a month of data to see it.

Track your queries. The numbers will change your setup too.