Photo: Unsplash
The Climate Model's New Engine
Climate models are the most computationally expensive scientific instruments humanity has built. Running a coupled atmosphere-ocean general circulation model at high spatial resolution — the kind of resolution needed to resolve regional precipitation patterns, individual storm systems, or the behavior of Arctic sea ice — requires thousands of processors operating for weeks to produce a single century-long simulation. This is why the Intergovernmental Panel on Climate Change’s assessment reports take years to compile: the simulations they draw on take years to run.
Machine learning is being introduced into this system with the goal of making it faster, and the results through early 2027 are real and contested in equal measure.
The Parameterization Problem
Climate models represent the physics of the atmosphere and ocean through differential equations. Some of these equations can be solved from first principles: Newton’s laws, the equations of fluid motion, conservation of mass and energy. These form the dynamical core of climate models, and they have not changed fundamentally since the 1960s.
The problem is scale. Many physical processes relevant to climate — convection, cloud formation, the turbulence that mixes the ocean, the radiative properties of ice crystals — occur at spatial scales far smaller than the grid cells in the model. A typical climate model grid cell is 50-100 kilometers on a side. Cumulus convection happens at scales of kilometers to tens of kilometers. You cannot explicitly resolve it.
Parameterization schemes are the solution: empirically derived formulas that approximate the aggregate effect of sub-grid processes on grid-cell-scale quantities. They are the craft knowledge of climate science, built over decades of observational campaigns, theoretical derivation, and careful tuning against real atmospheric data. They are also computationally expensive — a substantial fraction of climate model runtime is spent evaluating parameterization schemes.
Neural network parameterizations — replacing these schemes with machine learning models trained on the output of high-resolution simulations — were proposed as early as 2018 and have now been demonstrated at production scale. The efficiency gains are large: ML parameterizations can run 10-20 times faster than the physics-based schemes they replace, which translates directly into the ability to run more ensemble members, longer simulations, or higher resolution within the same computational budget.
The Instability Problem
The problem that emerged early and continues to be partially unresolved: neural network parameterizations can be unstable.
A climate model running with an ML parameterization in the place of its convection scheme may perform well for simulated decades, then develop instabilities that crash the simulation or produce unphysical behavior. This happens because the neural network is making interpolations in the learned statistical relationship between large-scale state variables and sub-grid fluxes, and it encounters input conditions that differ enough from its training distribution that its outputs are unphysical — negative water vapor, supersonic winds, temperatures that violate thermodynamic constraints.
Several research groups encountered this problem independently around 2020-2022. The responses have included training on more diverse datasets, adding physical constraints (the network cannot output values that violate conservation laws), and hybrid approaches where the neural network runs alongside a conventional scheme, with the convention taking over when the network’s output exceeds plausibility bounds.
By early 2027, several operational climate modeling centers — ECMWF in Europe, GFDL in the United States, and the Max Planck Institute for Meteorology — have successfully integrated machine learning components into their models and run stable multi-decadal simulations. The instability problem is not fully solved; it is managed. The management techniques work for the conditions in the training distribution and require ongoing attention when models are run outside it.
The Weather vs. Climate Distinction
The most successful application of AI in atmospheric science is not in climate projection but in weather forecasting, and the distinction matters.
GraphCast (DeepMind, 2023), Pangu-Weather (Huawei, 2023), FourCastNet (Nvidia, 2022), and their successors are data-driven weather prediction systems trained on the ERA5 reanalysis — a decades-long retrospective analysis of atmospheric state produced by ECMWF. These systems outperform conventional numerical weather prediction on medium-range (3-10 day) forecasts on several standard metrics, run in minutes rather than hours, and can be operated on a laptop rather than a supercomputer.
The success here is partly about the nature of the problem. Weather prediction involves mapping an atmospheric state at time T to an atmospheric state at time T+N, where N is days rather than decades. The training data covers a wide range of atmospheric configurations. The evaluation is direct: you predict tomorrow’s weather and compare it to what actually happened. The feedback loop is clear and fast.
Climate projection involves a different problem: projecting the response of the atmosphere to forcings that have no close analog in the historical record. The training data for a data-driven climate model includes no observations from a world with CO2 concentrations of 550 ppm. The validation problem is unsolvable in real time — you can only evaluate a 30-year climate projection after 30 years have passed. The distributional shift from training to deployment is structural and unavoidable.
This doesn’t mean AI climate models can’t work. It means they need different evaluation strategies: testing against paleoclimate data, comparison to high-resolution physics-based simulations in idealized settings, physical consistency checks, and careful uncertainty quantification. These are being developed, but they are not yet as mature as the weather forecasting evaluation framework.
Who Builds the Next Generation
There is a geopolitical dimension to AI in climate science that rarely surfaces in the technical literature. The major AI-weather systems — GraphCast, Pangu-Weather, FourCastNet — were built by large technology companies with access to enormous computational resources. ECMWF, the European weather prediction center, has the institutional mandate and scientific credibility but has been playing catch-up on machine learning infrastructure relative to industry labs.
This creates a structural question: should the world’s climate projection capability be owned and operated by national meteorological agencies and academic modeling centers, or by commercial technology companies? The current trajectory — technology companies building the most capable systems, then licensing or providing them to scientific agencies — has implications for transparency, reproducibility, and public accountability that the climate science community is actively debating.
The data on which these systems are trained — ERA5 and its predecessors — is publicly produced by publicly funded institutions. The models trained on that data are, under current arrangements, often proprietary. The tension between publicly produced scientific infrastructure and proprietary systems built on it is not unique to climate science, but in climate science it has particular urgency because the policy stakes are high enough that model governance is a matter of public concern.
The climate model is being rebuilt from the inside. The new engine is faster. It has known failure modes. It is being tested against everything the scientific community can think of. And the tests it cannot pass yet — the ones that require thirty years of atmospheric data that doesn’t exist — will take a generation to run.