Photo: Unsplash
What's Coming Next for AI on Mac — and How to Be Ready
Thirty posts ago, this series opened with a claim: your Mac is a local AI supercomputer you’re using at 10% capacity. A month of daily columns later — local writing editors, email pipelines, smart-home brains, photo search, and the mistakes that derail all of it — I want to close the month by looking forward. Not vague futurism; the specific, observable trajectories that tell you what your Mac will be doing in 18 months, and the short list of things worth doing now so you’re ready instead of catching up.
Open models are shrinking while improving
The single most underappreciated trend in AI isn’t at the frontier — it’s at the floor. Compare what an 8B-parameter open model does today against 18 months ago: early-2024’s 8B models were fragile toys that lost the thread after three paragraphs, fumbled basic structured output, and were instantly distinguishable from cloud models. Today’s 8B class — Llama 3.1 descendants, Qwen 3, Gemma 3’s smaller variants — handle multi-step instructions, emit reliable JSON for tool calling, write competent code, and hold long-document context. Tasks that genuinely required a 70B model in 2024 sit comfortably in 8–14B today.
The mechanism is no mystery: better data curation, distillation from frontier models, and longer training runs mean capability per parameter keeps climbing. Every time that happens, the hardware you already own gets a free upgrade. The 16GB MacBook Air that ran a mediocre assistant in 2024 runs a genuinely useful one now, on the same silicon. Extrapolate even conservatively and the model class that fits on a base Mac in 2027 does what today’s cloud mid-tier does.
Meanwhile the other blade of the scissors: Apple Silicon memory bandwidth climbs every generation. Bandwidth — not raw compute — is the binding constraint for LLM inference, since each token requires streaming the whole model through memory. The base M1 shipped ~68 GB/s; base chips now exceed 150 GB/s, Max-class parts push past 500 GB/s, and Ultra-class past 800 GB/s. Models getting smaller while the pipe gets fatter is a compounding function with an obvious limit case: frontier-class assistance, resident on a laptop.
Apple is opening the door, and agents are walking through it
Two structural shifts land on top of the hardware curve.
Apple Intelligence is becoming a platform, not a feature. The pivotal move was the Foundation Models framework — third-party apps can now call Apple’s on-device model directly, with guided generation for structured output, at an API cost of exactly zero, with no network round trip. That changes the economics of AI features completely: an indie developer no longer needs an OpenAI bill to ship summarization or smart categorization. Expect the next wave of Mac and iOS apps to have on-device intelligence as a baseline ingredient, the way apps assume a GPU today. For users, the practical consequence is that “AI on Mac” stops meaning “a chat window” and starts meaning invisible competence inside every app you already use.
The agentic shift is the bigger one. The chat paradigm — you type, it answers — is already giving way to models that use your computer: reading files, invoking apps, chaining tools, completing multi-step tasks. The plumbing standardized faster than anyone expected (the Model Context Protocol went from proposal to industry default in about a year), and on a Mac the agentic surface is unusually rich: AppleScript, Shortcuts, and a Unix userland are decades of automation infrastructure waiting for a natural-language driver. My smart-home post this month was a primitive version — a model emitting service calls. The mature version is “collect every invoice from my mail since January, rename them by vendor and date, file them, and draft the summary for my accountant” running locally against your real data. That’s not a demo I’m describing; it’s a workflow I already run in pieces. The pieces are fusing.
Local multimodality is the third leg. Whisper-class speech recognition is already effectively solved on-device. Vision-language models that can look at your screen, your photos, your PDFs are now small enough to run well on M-series hardware — Gemma’s and Qwen’s vision variants run today in Ollama and MLX. Speech in, vision in, speech out, all local: the I/O of a complete assistant, without a cloud dependency.
How to be ready: hardware
One rule dominates every other purchasing consideration: buy RAM, not CPU.
Every trend above — bigger contexts, resident agents, multimodal models, multiple models loaded simultaneously — consumes memory first and compute second. A mid-tier chip with 64GB of unified memory will be a dramatically better AI machine in 2028 than a top-tier chip with 16GB, and it’s not close. The chip determines how fast the answer arrives; the RAM determines whether the model can run at all, and unified memory is the one component you can never upgrade later.
My concrete guidance, unchanged from earlier in the series but now with a futureproofing argument attached: 16GB is the floor for casual local AI, 32GB is the sensible default for anyone reading this series, 64GB+ if local AI is a primary workload, and Studio-class bandwidth if you want 70B-class models at conversational speed. When you spec your next Mac and hover over the upgrade options, take the memory upgrade before the chip upgrade, every time.
How to be ready: skills and stack
The hardware will be ready whether or not you are. The human side is four habits:
Keep a clean, current local stack. The runtimes are improving monthly, not yearly — performance gains of 20-30% land in point releases.
brew upgrade ollama
pip install -U mlx-lm
Two commands, monthly cadence. While you’re at it, prune the model graveyard (ollama list, then ollama rm the experiments) — disk hygiene now saves confusion later.
Learn prompting as a durable skill. System prompts, structured output, decomposing a task into steps a model can verify — these transfer across every model generation and every provider. The Modelfiles and editor-persona prompts from earlier in this series are the entry-level version of a skill that compounds.
Learn basic scripting now. The agentic era’s biggest beneficiaries will be people who can glue things together — a 20-line shell script, a Shortcut, a Python loop around an API call. You don’t need to be a developer; you need to not be afraid of a terminal. Every workflow in this series was built from exactly that skill level.
Follow the MLX community. Apple’s MLX framework is where on-device AI development actually happens — new model architectures often run on Apple Silicon within days of release, quantization techniques debut there, and the mlx-community space on Hugging Face is the best leading indicator of what your Mac will do next quarter. Watching it costs ten minutes a week and replaces a hundred breathless headlines.
The thesis, stated plainly
Here is the bet underneath this entire month of columns: the gap between cloud frontier models and local open models keeps narrowing, and for personal computing, privacy plus ownership eventually wins.
The frontier will stay ahead — datacenters will always run something bigger than your desk can. But “ahead” matters less every quarter, because most of what people actually need from AI — writing help, summarization, search over their own data, automation of their own apps — crossed the good-enough threshold on local hardware some time ago, and the threshold keeps dropping toward smaller machines. Meanwhile the local advantages are structural, not temporary: your data never leaves, no subscription meters your usage, no terms-of-service update changes the deal, no product cancellation strands your workflow. Cloud AI’s advantages erode with every model release; local AI’s advantages are permanent.
The personal computer won the last era of computing for the same reason: people preferred owning their tools. I think the pattern repeats, and the Mac — accidentally, through a memory architecture designed for video editors — is the best-positioned consumer hardware on earth for it.
That closes the June run, but not the series — this is a daily column, and tomorrow there’s another post. July goes deeper: more agents, more pipelines, more honest measurements of what works on the hardware already sitting on your desk. The machine is ready. See you tomorrow.