Photo: Unsplash
The Engineer Who Built the Foundation of Modern AI (And Died Without Recognition)
In 2024, the Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton for foundational contributions to machine learning that enabled artificial neural networks. Hinton, who had left Google the previous year specifically to speak freely about AI risks, was a visible and articulate recipient: he had spent forty years in the field, given thousands of talks, trained dozens of influential researchers, and was instantly recognizable to anyone following AI developments. The prize seemed, to most observers, like an overdue recognition of contributions that the field had long known were foundational.
What the Nobel committee’s announcement did not mention was that a third person contributed work that is arguably as foundational as anything Hinton did, in direct collaboration with Hinton, and that this person was not available to share the prize because he died in 2011 from a degenerative brain disease — his mind, the instrument of his contribution, systematically destroyed by the same biological machinery he had spent his life studying in its computational form.
His name was David Rumelhart, and most people who use AI every day have never heard of him.
Backpropagation — the algorithm that makes it possible to train deep neural networks — is the technical foundation of modern AI. Without it, you cannot train a network with more than one or two layers, because you have no way to determine how to adjust the weights in earlier layers based on errors observed at the output. With it, you can propagate error signals backward through a network of arbitrary depth, using the chain rule of calculus to assign credit and blame to each connection in proportion to its contribution to the error. Backpropagation converts the conceptually appealing but computationally intractable idea of deep neural networks into something that can actually be trained.
The development of backpropagation is usually dated to a 1986 paper in Nature: “Learning representations by back-propagating errors,” authored by David Rumelhart, Geoffrey Hinton, and Ronald Williams. The paper is one of the most cited in the history of computer science. It demonstrated clearly that multi-layer networks could learn to represent complex patterns when trained with backpropagation, resolving a long-standing skepticism about whether deep networks could work at all. For researchers who had been discouraged by the limitations of single-layer perceptrons — limitations famously documented by Minsky and Papert in their 1969 book “Perceptrons” — the 1986 paper reopened a field that many had written off.
Rumelhart was the first author on that paper. In academic convention, first authorship typically signals primary intellectual contribution. He had developed the core mathematical intuition for the algorithm, worked through its properties, and driven the collaboration that produced the publication.
David Rumelhart’s intellectual range extended far beyond that single paper. He was one of the architects of parallel distributed processing, an approach to cognitive science that modeled mental processes as patterns of activation distributed across networks of simple processing units — an approach that was deeply controversial when proposed in the 1980s because it challenged the dominant symbolic AI paradigm and because it made strong claims about how human minds might actually work, not just how computer programs might simulate thinking.
His work on schema theory — how humans understand text by mapping it onto prior knowledge structures — was foundational in cognitive psychology. His research on the nature of analogy, on how people learn to read, and on the structure of story understanding was influential across multiple disciplines. He was not narrowly a machine learning researcher; he was a cognitive scientist trying to understand the mind, who recognized that computational models were the most powerful tools available for testing theories about how thinking works.
His collaborator Geoffrey Hinton has said in interviews that Rumelhart was the most intellectually creative person he had encountered in his career — that Rumelhart’s ideas consistently ran ahead of what the rest of the field was prepared to understand. This is the kind of tribute that is sometimes offered generously after a colleague’s death, but in Rumelhart’s case it appears to be supported by the record: his 1986 work was largely ignored for a decade before the computational capacity necessary to realize its potential became available, and by the time the deep learning revolution began in earnest around 2012, Rumelhart had already been ill for years.
The disease that took Rumelhart was progressive supranuclear palsy, a neurodegenerative condition that affects movement, balance, cognition, and eventually the ability to communicate. It is relentless, irreversible, and cruel in the specific way that diseases are cruel when they attack the very faculties that defined their victim’s life and identity. By the mid-2000s, as researchers were beginning to demonstrate that deep neural networks trained with backpropagation could outperform other approaches on image recognition and speech recognition tasks, Rumelhart could no longer participate in the scientific conversation his work had made possible. He died in 2011, the year before AlexNet — the breakthrough that began the modern deep learning era — was trained and presented at NeurIPS.
The timing is almost precisely wrong. The delay between Rumelhart’s foundational work and the explosion of applied AI that it enabled was caused not by any flaw in his ideas but by the practical constraints of available compute. The ideas were correct in 1986. The hardware to realize them at scale wouldn’t exist until the 2010s. Rumelhart got the biology right about how learning should work. He simply lived too early, or died too soon.
The story of how scientific credit gets attributed is a story about visibility, longevity, and the difference between foundational work and applied work. Scientific credit concentrates at the top of research hierarchies: well-known scientists at elite institutions, prolific publishers, charismatic communicators. It accumulates over time through citation networks in ways that systematically favor the work that gets cited most, which is often the work that arrived at the right moment to be applied rather than the work that made the application possible.
The sociologist of science Robert Merton described this dynamic in 1968 as the Matthew effect — named for the Gospel verse “to him who has, more will be given.” Scientists who are already eminent receive disproportionate credit for collaborative work; findings produced by unknown researchers receive less attention than the same findings produced by famous ones. In AI specifically, this dynamic is amplified by the technology industry’s preference for founder narratives and celebrity researchers over the distributed, cumulative nature of how scientific progress actually happens.
What is lost when credit concentrates this way is not merely a matter of fairness to individual researchers, though it is that. It distorts how we fund research and what we incentivize scientists to do. If the reward structure consistently credits the final application rather than the foundational insight, the incentive for foundational work is reduced. If scientists know that decades may pass between their foundational contribution and its practical realization, and that if that gap happens to include their own incapacity or death they will receive no recognition, the calculation about whether to do risky basic research becomes harder.
There is a further irony in Rumelhart’s specific situation. The tool of backpropagation is built on the chain rule of calculus — a mathematical formalism for decomposing the effect of changing one variable on another through a chain of intermediate relationships. The algorithm that Rumelhart helped develop trains networks by precisely attributing credit: by computing, layer by layer, exactly how much each weight contributed to the overall error. It is a machine for assigning credit correctly, and the person who helped build it received credit that was partial, delayed, and ultimately incomplete because he died before the world was ready to recognize his contribution.
The solution to the credit attribution problem in science is not simple. Priority disputes are among the most bitter conflicts in academic life, and overcorrecting — assigning credit too broadly, or contesting too aggressively — creates its own distortions. What is tractable is being more deliberate about how we construct the histories we teach and the narratives we use to explain progress. When we tell the story of deep learning, Rumelhart belongs in it at the same level of prominence as Hinton. When we construct AI timelines that run from Turing through the neural network work of the 1980s and 1990s to the deep learning breakthrough of the 2010s, the connective tissue of that timeline includes work by researchers whose names most people in the industry don’t know.
The practical consequence is for how we design funding and recognition systems for basic research. The deep learning revolution was enabled by work done in the 1980s by researchers who had no clear sense of what the applications would be or when they would arrive. Many of those researchers were funded by DARPA and NSF grants that provided long-duration funding for speculative work without requiring near-term applications. The political pressure to make research funding more “impact-focused” — more tied to near-term commercial or policy applications — would, if applied to the 1980s, have defunded exactly the work that made current AI possible.
David Rumelhart cannot receive a Nobel Prize because that prize, unlike some others, does not recognize deceased researchers. The field that his work made possible did eventually give him its highest recognition: he received the first David E. Rumelhart Prize — named for him — when the prize was established in 2001. He did not attend the award ceremony. He was already too ill.
What we can do with his story is use it to resist the compression of complex histories into simple narratives. AI was not invented by a handful of visionaries in Silicon Valley. It was built by hundreds of researchers over decades, and the distribution of credit among those researchers reflects the biases of the institutions through which scientific recognition flows, not the actual distribution of intellectual contribution. Getting that history right matters not just for Rumelhart’s memory, but for understanding what kind of research conditions are worth protecting.


