Photo: Unsplash
The Telescope That Thinks: AI's Transformation of Observational Astronomy
The Vera C. Rubin Observatory in Chile began routine operations in late 2024. Its Legacy Survey of Space and Time (LSST) will photograph the entire visible southern sky every three nights, generating approximately twenty terabytes of data per night. Over its ten-year operating lifetime, it will produce a catalog of approximately thirty-seven billion distinct objects.
There are roughly fifty active members of the LSST Science Collaboration’s transient science working group. No amount of hiring will close that gap. AI was not optional for this observatory; it was a design requirement.
This is the situation that crystallizes the transformation of AI from tool to infrastructure in observational science: when the data rate exceeds human review capacity by several orders of magnitude, the question is not whether to use automated classification but which automated classification system to use and how much to trust it.
From Aid to Infrastructure
Machine learning classification in astronomy has a longer history than most people realize. The original application was galaxy morphology classification — sorting galaxies by shape (elliptical, spiral, irregular) from survey images. This was being done with neural networks in the mid-1990s. The Galaxy Zoo citizen science project, which launched in 2007 and used volunteer human classifiers, was partly a response to the recognition that even trained astronomers couldn’t keep up with survey data volumes.
What changed after 2020 was not the existence of machine learning in astronomy but its position in the scientific pipeline. Previously, ML was used to pre-process data before human review. Now, for many classes of discovery, ML is the primary discoverer — humans review ML-flagged items, not raw data. The shift is subtle but significant. It means that the biases of the ML system determine which phenomena get followed up, which anomalies get resources, which sky objects become papers.
The Zwicky Transient Facility, which preceded Rubin and has been running since 2018, provides the most studied example. ZTF generates alerts for roughly one million transient events per night — sources that changed brightness or position between observations. The alert stream is processed by an automated broker (currently several competing brokers including ALeRCE, Fink, and ANTARES), which classifies each alert as likely supernova, variable star, active galactic nucleus, or any of several other categories, and assigns a probability score. Human scientists receive a filtered stream of high-priority alerts for the categories they care about.
The system works well for known object classes that have good training data. Supernovae: excellent. RR Lyrae variable stars: excellent. Gravitational wave electromagnetic counterparts: good. The failure mode is, predictably, things that don’t look like anything in the training data — the anomalies that might be scientifically most interesting.
What Got Found
Several legitimate discoveries by early 2027 owe their existence primarily to AI-driven analysis of large survey datasets.
The most significant class: the systematic characterization of tidal disruption events (TDEs) — phenomena where a star passes close enough to a massive black hole to be torn apart by tidal forces. Before machine learning analysis of wide-field survey data, TDEs were rare curiosities; fewer than thirty were confirmed in the astronomical literature before 2018. As of early 2027, several hundred have been identified, including a population of unusual “partial TDEs” where the star survives the encounter and the resulting flaring pattern has no clean analog in earlier observations. The statistics to understand what fraction of galactic centers produce TDEs, and at what rate, require the sample sizes that only automated survey analysis can provide.
The reclassification of a substantial fraction of “active galactic nucleus” detections as a heterogeneous population with distinct subclasses is another AI-enabled advance. What the previous generation of surveys lumped together as AGN turns out, with sufficient photometric and spectroscopic data analyzed by ML classifiers with enough sensitivity, to be several physically distinct phenomena that had been conflated due to insufficient sample statistics. This is not a single discovery; it is a quiet reorganization of a scientific category.
The Anomaly Problem
The more philosophically interesting contribution — and the more scientifically contested — is anomaly detection.
Standard classification assigns known classes. Anomaly detection looks for objects that don’t fit known classes. This sounds straightforward. In practice, in a survey generating a billion data points per year, the definition of “anomaly” is everything. A naive anomaly detector flags sensor artifacts, data processing errors, and statistical flukes far more often than scientifically interesting objects.
The Astronomaly software package, and similar tools developed independently at several observatories, uses active learning: an AI identifies candidate anomalies, a human expert reviews a small sample and labels them (interesting / not interesting), and the model updates its anomaly definition based on those labels. This semi-supervised loop converges toward a classifier that reflects the expert’s scientific judgment without requiring the expert to review millions of objects.
This approach has flagged several genuinely unusual objects that merited follow-up. The most interesting, as of this writing, is a class of faint periodic emitters found in ZTF data whose spectral properties do not cleanly match any established source class. The current best hypothesis involves exotic binary star systems, but the sample is small and the follow-up observations necessary to confirm or rule out competing explanations are still being gathered.
Whose Science Gets Done
The automation of astronomical discovery raises a structural question about the field: when AI determines which anomalies receive human attention, the scientists whose scientific judgment trained the AI system have disproportionate influence over what gets found.
If the anomaly detector is trained primarily by astronomers interested in TDEs, it will find more TDE-related anomalies and fewer anomalies relevant to other phenomena. The training labels reflect the trainees’ scientific priors. This is not a bug in the specific technical sense — the system is doing what it was designed to do — but it is a selection effect with scientific consequences.
The field has been aware of this since approximately 2021, and several groups have made deliberate attempts to train anomaly detectors on diverse scientific objectives. Whether the effort is proportionate to the problem is debatable. Astronomy is not unique here — the same concern applies to any field where AI systems prioritize problems for human attention. But astronomy is one of the first fields where it is entirely unavoidable, because the alternative (not automating) produces no science at all.
The telescope that thinks is not thinking independently. It is thinking with the values, priorities, and blind spots of the scientists who trained it. That is not a reason to distrust it. It is a reason to be specific about who did the training, what they wanted to find, and what they might therefore be systematically missing.