AI Power User

Mac Studio as a Family Photo AI — Find Any Memory in Seconds

Semantic search across twenty years of family photos without uploading a single one to anyone's cloud

By Jakub Jirák Jun 28, 2026 6 min read

photo-managementapple-siliconcliplocal-ai

My mother asked me in May whether I had “that photo from Šárka’s graduation — the one where grandpa is laughing in the rain under the blue umbrella.” Eleven years ago. Somewhere in a library of 184,000 photos accumulated across four iPhones, two DSLRs, a scanner project, and three family members’ merged camera rolls.

I typed “older man laughing under blue umbrella rain” into a search box on my Mac Studio. The photo was the second result. Total elapsed time: about four seconds, most of which was me typing.

That search never touched the internet. This post is how the system works — first squeezing everything out of Apple Photos’ built-in intelligence, then going past it with an open-source CLIP index, plus the unglamorous prerequisite work of deduplication. It’s structured as the weekend project it actually was.

First, max out what Apple Photos already does

Before installing anything, understand that Photos on Apple Silicon already runs surprisingly strong on-device AI, and most people use a tenth of it.

Name faces relentlessly — and consistently. People album → name every face, and merge duplicates the moment Photos splits one person into “Šárka” and “Sarka.” The consistency rule matters at the family scale: agree on one canonical name per person across everyone who touches the library. Face recognition gets dramatically better at age-spanning matches (the same kid at 3 and at 17) once you’ve confirmed a few dozen examples across eras — spend the twenty minutes confirming the “Is this also Šárka?” prompts.

Use the search operators people don’t know exist. Photos search quietly accepts stacked terms: a person plus a scene plus a place plus a date — try Šárka beach 2019, or dog snow Brno. It indexes recognized text in photos (search the word on a birthday banner and the photo surfaces), identifies thousands of object categories, and since the Apple Intelligence era handles loose natural phrases like “kids eating cake outside.” Most people type one word, scroll, and give up; stacking three terms is usually the difference.

The limits you’ll hit. Apple’s semantic vocabulary is fixed and opaque — “umbrella” works, “blue umbrella while laughing” mostly doesn’t. There’s no relational understanding (“at grandma’s house” only works if location metadata says so), no way to tune it, and no batch interface. That’s where CLIP comes in.

Going beyond: a CLIP index over the whole library

CLIP-class models embed images and text into the same vector space, which means you can search pictures with sentences. Several open-source tools wrap this for photo libraries; the one I settled on is rclip for its dead-simple CLI, with immich as the heavyweight alternative if you want a full self-hosted Google Photos replacement with a web UI for the whole family.

The minimal path, on a folder export of your library:

pipx install rclip
cd /Volumes/Photos/library-export
rclip "kids in snow at grandma's house"

First run indexes everything; afterward, queries return ranked matches in a couple of seconds. The query that opened this post — and queries like “birthday cake with sparklers in a dark room” or “red Škoda in front of the old house” — are exactly the kind of compositional, relational searches Apple Photos can’t express. “At grandma’s house” works in mine because CLIP actually recognizes the house’s distinctive green façade across hundreds of photos, no GPS required — scanned 1990s prints with zero metadata become searchable for the first time.

Performance on a big library, measured. Indexing throughput is the scary-sounding part that turns out fine on Apple Silicon. My M2 Ultra Mac Studio indexed the full 184,000-photo library in just under 4 hours — roughly 13–14 photos per second, GPU-accelerated. An M-series MacBook Pro will land in the 6–10 photos/second range; even a base M-chip Mac mini gets a 100k library done overnight. The index itself is small — a few gigabytes of vectors for mine — and incremental runs to pick up new photos take minutes, so I cron it weekly. Queries are near-instant regardless of library size; the cost is all paid at indexing time.

If you choose immich instead, it runs in Docker on the same Mac, does its own CLIP indexing plus face recognition, and gives phones an upload app — effectively your private Google Photos. More setup, more capability.

The deduplication purge

Twenty years of camera rolls means duplicates: the same vacation imported from a camera and synced from a phone, burst shots, the WhatsApp re-saves, the scanner project’s three passes. Before my cleanup, a meaningful slice of those 184k photos were copies. Dedup before you index — it makes search results cleaner and reclaims real disk.

Photos itself has a Duplicates album (Utilities → Duplicates) that catches exact and near-exact matches; merge everything it offers, it’s free and safe. For the deeper pass — visually similar but non-identical files scattered across folders outside Photos — I used czkawka, an open-source duplicate finder with perceptual hashing:

brew install czkawka
czkawka image -d /Volumes/Photos/library-export

It groups look-alikes by similarity threshold and lets you auto-keep the largest-resolution copy from each group. My purge removed 31,000 files and 290GB. Do this with backups confirmed and the trash-not-delete option on, obviously. The pleasant side effect: search results stopped showing the same moment five times.

While you’re in cleanup mode, build the family timeline: with faces named and junk removed, smart albums per person per era (“Šárka 0–5,” “Šárka school years”) practically assemble themselves, and Photos’ Memories engine starts generating genuinely good retrospectives because it’s no longer choking on duplicates. I keep a shared album, “Family Archive — Best of Each Year,” that the CLIP search makes trivially easy to populate: query the moment you remember, drop the best frame in.

The privacy argument, plainly

Google Photos does everything above, smoothly, in the cloud — by ingesting your family’s entire visual history onto Google’s servers, where it’s mined to classify faces, places, relationships, and life events, governed by policies you don’t control and that change without your consent. Faces of your children, medical situations, your home’s interior, every license plate you’ve ever photographed.

The Mac Studio version produces the same magic with a different data flow: the photos sit on your disk, the models run on your silicon, the index lives in your home folder, and the only entity that learns your grandfather laughed under a blue umbrella in 2015 is you. For most data categories I’m pragmatic about cloud trade-offs. Family photo archives are where I draw the line — they’re the one dataset that covers everyone you love, spans decades, and can never be un-uploaded.

The weekend plan

How it actually breaks down, from my notes:

Saturday morning — backup and dedup. Verify Time Machine plus one offsite copy. Run Photos’ Duplicates merge, then czkawka on any loose folders. (2–3 hours of attention, mostly waiting.)

Saturday afternoon — faces and consolidation. Name faces, merge person duplicates, confirm the recognition prompts, agree on canonical names with the family. Import stray folders into one library. (2 hours, weirdly nostalgic, budget extra for getting lost in 2009.)

Saturday evening — start the CLIP index. Install rclip or immich, point it at the library, let it run overnight. (20 minutes of work, hours of unattended GPU time.)

Sunday — search day. Test queries, set up the weekly re-index cron job, build the per-person smart albums and the family timeline album, and field search requests from relatives like the oracle you have become. (2 hours.)

Total hands-on time: under a day, spread across a weekend, on hardware you already own. The result is the best private answer to “remember that photo where…” that has ever existed — and a family archive that’s finally searchable by what’s in the pictures, not just when they were taken.