What Genomics Learned From AI — and What It Taught Back

Photo: Unsplash

Genomics

What Genomics Learned From AI — and What It Taught Back

The relationship between machine learning and genetics has been the most productive scientific partnership of the decade, and also the most instructive about what AI cannot do
genomicsai-researchgeneticsscientific-discoveryprecision-medicine

Genetics gave machine learning its most tractable early large-scale problem. The genome is a sequence — four letters, three billion positions, a code that maps (imperfectly, non-linearly, contextually) onto biology. Sequences are what language models were built for. Before any researcher had to explain why deep learning was relevant to genomics, the data structure was already making the argument.

The collaboration started seriously around 2016, when DeepMind’s DeepVariant showed that convolutional neural networks could call genetic variants from sequencing data more accurately than traditional algorithms. This was methodological, not biological — it improved how we read the genome, not what we understood about it. But it established a pattern: AI as a better signal processor for biological data, before it became a generator of biological hypotheses.

By early 2027, that progression — from signal processing to hypothesis generation — is substantially complete in some domains and barely started in others. The contrast is illuminating.

Where It Worked: Variant Effect Prediction

The genome-wide association study (GWAS) methodology, which scans hundreds of thousands of genetic variants across thousands of people to find statistical associations with traits, produced by 2023 a catalog of approximately 500,000 significant associations between genetic variants and human diseases or traits. The catalog is real and useful. It is also deeply confusing: most individual variants have tiny effects, most variants are in non-coding regions of the genome, and the molecular mechanism connecting a specific variant in an intergenic region to a specific disease risk was often completely unclear.

Machine learning contributed to this problem through models that could predict the functional effect of non-coding variants — whether a single-base change in a regulatory region would increase or decrease expression of nearby genes, disrupt transcription factor binding, or alter chromatin accessibility. The most influential of these, Enformer (published by DeepMind and collaborators in 2021) and its successors, can predict gene expression patterns across 5,000+ cell types from raw sequence with reasonable accuracy. They are not correct in every case. But they provide a biologically plausible framework for generating hypotheses about the functional relevance of GWAS hits — converting statistical associations into testable mechanistic questions.

This has been genuinely useful. Several drug targets that had been identified by GWAS but whose mechanism was opaque have been clarified by AI-predicted regulatory effects. One clear example: a variant associated with Type 2 diabetes risk that sits in a non-coding region near the TCF7L2 gene, whose functional mechanism had been debated for a decade, was shown in 2025 to affect a specific regulatory element predicted by Enformer to control pancreatic beta-cell-specific expression. The prediction preceded the experiment. The experiment confirmed it. This is the right order.

Where It Struggled: Polygenic Risk

The genomics application that attracted the most commercial interest — and has produced the most contentious results — is polygenic risk scores: the attempt to aggregate thousands of small genetic effects into a single number that predicts an individual’s risk for a complex disease like coronary artery disease, breast cancer, or schizophrenia.

Polygenic risk scores work, in the sense that top-decile scorers for coronary artery disease are roughly three times more likely to develop it than bottom-decile scorers. This is a real effect with real clinical utility. The question is where the scores break down, and the answer turns out to be: across populations.

Most GWAS studies were conducted in individuals of European ancestry. The genetic variants they identified, and the weights assigned to those variants in polygenic risk scores, are calibrated for that population. When applied to individuals of African, East Asian, or South Asian ancestry, the performance drops substantially — sometimes to near-zero. This is not a mystery. The statistical associations found in one population reflect the linkage disequilibrium structure (which variants tend to co-occur) of that population. Transfer them to a population with different ancestry, different demographic history, and different variant frequencies, and the predictions lose validity.

AI did not create this problem. GWAS created it. But AI, in the form of more powerful models trained on the same biased data, did not solve it — it amplified it, because the models found more associations in the majority-ancestry data and produced higher-performing scores for that population, which made the differential performance across ancestries even larger. The scientific community has been aware of this since at least 2019. The response — more diverse cohorts, population-specific models, transferability methods — is real but slow.

The Single-Cell Revolution

The most transformative AI-genomics partnership of the past five years may be in single-cell sequencing analysis. Techniques developed since 2015 allow measurement of gene expression in individual cells — not tissue averages, but cell-by-cell profiles of which genes are active and at what level. A single experiment on a human tissue sample can produce expression profiles for tens of thousands of cells, generating a dataset that would have been incomprehensible before modern dimensionality reduction and clustering methods.

Machine learning did not invent single-cell sequencing, but it made the data interpretable. Tools like scVI, Seurat, and their successors — and now foundation models like Geneformer and scGPT, trained on millions of single-cell profiles — can identify cell types, trace developmental trajectories, compare healthy and diseased tissue, and predict how cells will respond to perturbations. A 2025 paper using Geneformer fine-tuned on cardiac cell data correctly predicted, in held-out data, which genes in cardiomyocytes were dosage-sensitive — a property relevant to cardiovascular disease genetics that would have taken years of systematic experimental work to characterize.

The single-cell domain is where the AI contribution feels most analogous to a genuine scientific instrument — not a replacement for biological thinking, but a tool that makes visible structures that were previously invisible. The developmental trajectories revealed by pseudotime analysis, the rare cell types that appear as small clusters in high-dimensional expression space, the regulatory networks inferred from co-expression patterns — these are biological realities that existed before single-cell sequencing was invented. The method revealed them.

What Genomics Taught AI

The relationship is not one-directional. Genomics has also taught machine learning something important: that large, well-curated, biologically meaningful datasets are extremely hard to build, and that models trained on imperfect data in biologically variable systems require careful uncertainty quantification.

The concept of dataset shift — your model performs well on data like your training set and poorly on data that differs in systematic ways — is a generic machine learning problem. Genomics encountered it specifically through the population diversity problem, and the field’s response has pushed methodological development in uncertainty-aware models, domain adaptation, and transfer learning that has applicability well beyond genetics.

There is a version of this story where genomics and AI are transforming medicine at a rate that justifies the investment and the excitement. There is also a version where the practical clinical impact, despite fifteen years of effort, remains concentrated in a few narrow domains — newborn screening, certain cancer diagnostics, rare disease identification — while the larger promise of precision medicine remains perpetually five years away.

Both versions are true. The question is which one drives the resource allocation, and whether the pace of progress is fast enough to outrun the problems that will arise from premature clinical deployment of tools whose limitations are still being mapped.