AI Power User

I Connected AI to My Smart Home Through a Mac Studio

A local LLM as the natural-language brain for Home Assistant with no cloud voice assistant listening in

By Jakub Jirák Jun 27, 2026 6 min read

home-assistantlocal-llm mac-studiosmart-home

“The office is too bright and I’m on a call.”

I said that sentence out loud last Tuesday, to no one in particular, and my office responded: blinds to 40%, the overhead light off, the desk lamp to a warm low setting, and — because “on a call” matched an automation rule — the hallway speaker paused and a do-not-disturb flag went up for the doorbell chime. No cloud service heard me say it. The audio went from a microphone to my Mac Studio, was transcribed by a local Whisper model, reasoned over by a local LLM, executed through Home Assistant, and confirmed by a locally generated voice. Round trip: about two and a half seconds.

This is the most science-fiction thing my Mac does, and it’s built entirely from parts you can download today. It is also, fair warning, the least “weekend-afternoon” project in this series. Here’s the architecture, the genuinely new capabilities, and the honest cost.

The Mac as the always-on brain

The architecture has three layers, and the Mac Studio hosts all of them.

Layer 1: the smart home platform. I run Home Assistant in a container — it’s the open-source hub that speaks to everything: Zigbee lights, HomeKit devices, the thermostat, presence sensors.

docker run -d --name homeassistant --restart=unless-stopped \
  -v ~/homeassistant:/config --network=host \
  ghcr.io/home-assistant/home-assistant:stable

If Docker is a bridge too far, there’s a lighter path: keep HomeKit as your platform and use Apple Shortcuts as the execution layer, with the Mac running personal automations. You lose Home Assistant’s device breadth and its excellent LLM integration, but the natural-language layer still works — the LLM outputs a scene name, and a Shortcut triggers it.

Layer 2: the language model. Ollama, running as a service, with a tool-calling model — I use qwen3:14b, which is the sweet spot of fast and reliable at structured output. Home Assistant’s built-in Ollama integration (Settings → Devices & Services → Add Integration → Ollama) exposes your devices to the model as tools. This piece matters: the LLM doesn’t generate freeform text that something parses with regex. It emits actual service calls — light.turn_on, cover.set_position — against a device list Home Assistant hands it.

Layer 3: the voice pipeline. Home Assistant’s Assist pipeline, fully local: Whisper (via the wyoming-whisper container, small model) for speech-to-text, the LLM for reasoning, and Piper for text-to-speech. Microphones are cheap ESP32-based voice satellites in three rooms, about €15 each. Every byte of audio stays inside my LAN. There is no wake-word server in Seattle, no “recordings used to improve the service.” The off switch for cloud listening isn’t a settings toggle — it’s the absence of any cloud in the diagram.

What LLM reasoning enables that dumb triggers can’t

Classic home automation is IF trigger THEN action. It’s brittle, and everyone who’s lived with it knows the failure mode: lights that flick on when the cat moves, blinds that close on the one cloudy day you wanted sun. An LLM in the loop changes the grammar from triggers to intent. Concrete examples running in my house:

Context-fused morning logic. “Get the house ready for the morning” is one phrase, but the model sees weather (rain expected → leave blinds down later), my calendar (first meeting at 8:30 → coffee machine warm by 8:00), and presence (partner already left → skip her office). Writing that as conventional automations means a thicket of nested conditions; as an LLM prompt with tool access, it’s a paragraph of system-prompt policy.

Vague commands resolved by state. “It’s stuffy in here” — the model checks which room my phone’s in, sees CO₂ at 1100 ppm from the sensor, and opens the window actuator instead of just spinning a fan. Alexa needs the exact incantation; the LLM needs only the complaint.

The guest protocol. “We have guests overnight” reconfigures fourteen things — guest Wi-Fi QR on the hallway display, hot water schedule extended, motion-triggered lights in the guest corridor dimmed to 10% after midnight. One sentence, because the model holds the policy and the device list.

The honest framing: the LLM isn’t “smarter” than automations — it’s a compiler from human vagueness to the same service calls. But that compilation step is exactly what made smart homes annoying for a decade.

The latency reality check

Numbers, because this is where local pipelines get accused of being toys. Measured wall-clock from end-of-speech to action, averaged over 20 commands on an M2 Ultra:

Simple device command (“turn off the kitchen lights”): 1.8–2.5 seconds. Whisper transcription ~0.5s, LLM reasoning and tool call ~1–1.5s, execution near-instant.
Multi-step contextual command: 3–5 seconds, as the model makes several tool calls.
Siri on a HomePod, same room, same lights: ~1–1.5 seconds. Alexa: similar.

So the cloud assistants are faster on simple commands — they should be, they’re pattern-matching against a fixed grammar in purpose-built datacenters. The local pipeline pays a 1-second tax on “lights off” and then does things Siri categorically cannot, like the stuffy-room inference. My household’s verdict after four months: the tax is noticeable for the first week and invisible after, because the commands got more ambitious instead. Nobody says “turn on lamp two” anymore.

One real limitation: a slow model ruins this. On a base M-chip Mac mini with an 8B model, multi-step commands stretch past 6 seconds, which crosses from “assistant” into “waiting.” This use case is genuinely where Studio-class memory bandwidth earns its keep.

Energy and the always-on question

An always-on Mac sounds extravagant. It isn’t. Measured at the wall: my Mac Studio idles at 9–11W with Home Assistant and Ollama loaded but quiet — less than two LED bulbs. During LLM inference it spikes to 90–120W for a few seconds per command. Daily total for the whole smart-home brain: roughly 0.3 kWh, around €0.08 at my tariff. A Raspberry Pi is cheaper still, but it can’t run the LLM layer at usable speed; the Mac replaces a Pi plus would-be cloud subscriptions plus serves as my general local-AI box. The marginal cost of the smart-home duty is effectively the idle delta: a few watts.

The honest section: this is a hobbyist weekend, not ten minutes

I need to be direct about complexity, because the demo paragraph at the top makes this sound like an app install. It is not. Budget a full weekend if you’re comfortable with Docker and YAML, longer if you’re not. The real time sinks, from my notes:

Device onboarding — getting every bulb and sensor into Home Assistant — took longer than all the AI parts combined. Day one, mostly.
Exposing the right entities to the LLM. Expose all 200 entities and the model gets confused and slow; curate ~40 that voice should control. An hour of judgment calls.
Prompt policy tuning. Teaching the system prompt your house’s rules (“never unlock doors by voice,” “bedroom lights cap at 30% after 22:00”) is iterative. I’m still editing it monthly.
Voice satellite hardware — flashing ESP32 boxes, placing mics. An evening.
The debugging tail. Whisper mishearing my accented English on “blinds” vs “lights” took two evenings to tune around.

Total honest figure: 15–20 hours to reach the wife-approval threshold of reliability. If that number excites you rather than repels you, you are the audience, and the payoff is real: a home that understands sentences, answers in under three seconds, and sends exactly zero syllables of your household’s audio to anyone else’s computer.

Start with Home Assistant and one room’s lights, add the Ollama integration once devices are stable, and leave voice for last — text chat through the Home Assistant app is the low-friction way to test the brain before you give it ears.