The Economics of Scientific Fraud

Photo: Unsplash

Science History

The Economics of Scientific Fraud

Academic fraud is not a failure of individual ethics—it is the predictable output of a system that built the wrong incentives.
science-historyacademic-economicsinstitutionsincentivesresearch

In 1974, William Summerlin, an immunologist at Memorial Sloan Kettering Cancer Center, used a black felt-tip pen to touch up the patches of skin on white mice that were supposed to demonstrate successful skin grafts across unmatched donors. The grafts had not taken. The research had not worked. The demonstration he was preparing for his director, Robert Good, required that it look like it had. A laboratory technician noticed that the dark patches wiped off with alcohol and reported this to Good, who reported it to the institution. Summerlin was suspended. The investigation that followed described his behavior as a product of “emotional exhaustion and depression.” He lost his position but was not prosecuted.

What the 1974 investigation did not examine was the institutional environment that produced Summerlin’s desperation. Sloan Kettering in the early 1970s was under considerable pressure to demonstrate translational results. Robert Good, then the most cited immunologist in the world, had recruited Summerlin on the basis of early results that turned out to be difficult to replicate. The pressure to produce was real and communicated clearly. The incentive structure of the institution — like the incentive structure of every research institution — rewarded results and did not reward the accurate reporting that a promising line of research had not panned out.

Scientific fraud is typically discussed as a story about bad individuals: researchers who were greedy, or dishonest, or under extraordinary personal pressure, who made the choice to fabricate or falsify results. This framing is not wrong about the individuals. It is deeply misleading about the phenomenon. The rate of scientific fraud, measured by retraction data and confirmed misconduct investigations, has increased significantly over the past five decades in ways that cannot plausibly be explained by a sudden deterioration in the ethics of individual scientists. What has changed is the competitive environment of academic science.

The publish-or-perish dynamic, which became the defining feature of academic careers in the 1970s and 1980s, is now so thoroughly established that its origins are invisible to most participants. The mechanism is simple: academic careers are evaluated primarily on publication record, which means that producing publications is the main thing an academic career requires. Publications in high-status journals are worth more than publications in lower-status journals, so researchers compete intensely for placement in a small number of high-prestige outlets. High-prestige journals prefer novel, positive results over replications and negative findings. This means that the academic career system rewards the production of novel, positive results in a way that has very little to do with whether those results accurately describe the world.

The pressure is not equally distributed. Junior researchers — graduate students, postdoctoral fellows, early-career faculty on temporary contracts — face the most acute version of the problem. The academic job market has become dramatically more competitive since the 1970s, while the supply of graduate students has remained high (partly because graduate students are a relatively cheap source of research labor). The number of permanent academic positions has not kept pace. The result is a large population of highly trained researchers competing for a small number of jobs, all on the basis of publication records that they are actively building under the supervision of senior researchers who have significant power over their careers.

This is the environment in which fraud disproportionately occurs. Not because graduate students are less ethical than senior professors (though some senior professors are quite comfortable encouraging practices that shade toward misconduct and calling it mentorship) but because the cost-benefit calculation of fraud looks different when your career depends on the next paper and the next paper depends on results that have not cooperated.

The physicist Richard Feynman described this in a 1974 commencement address as “cargo cult science” — the adoption of the superficial forms of scientific practice without the underlying commitment to rigor that gives those forms meaning. His observation was prescient. The cargo cult dynamic has metastasized in the decades since, not because scientists stopped caring about rigor but because the incentive system makes rigorous negative results almost worthless for career purposes.

The specific mechanics of scientific fraud follow recognizable patterns. The most common form is not wholesale fabrication — inventing data from scratch — but selective reporting and subtle manipulation: running many statistical tests and reporting only the significant ones (p-hacking), excluding outliers that do not support the expected finding, adjusting data collection procedures after seeing initial results without disclosing this, presenting exploratory analyses as confirmatory ones. These practices exist on a spectrum from acceptable analytical choices to clear misconduct, and the fuzziness of the spectrum is itself a feature of the environment, because it allows researchers to convince themselves they are on the acceptable side while operating at the boundary.

The social psychologist Diederik Stapel committed outright fabrication — he invented entire datasets for dozens of published studies on social behavior — and was caught in 2011 by graduate students who noticed that his data were suspiciously clean and consistent. The investigation that followed produced a thoughtful institutional report (known in the Netherlands as the Levelt report) that made explicit what fraud investigations usually obscure: that Stapel’s misconduct was enabled by an institutional culture that rewarded impressive results, that did not have functional systems for verifying data, and that treated the questioning of a senior colleague’s results as a social transgression rather than a scientific obligation.

The Stapel case is unusually extreme, but the Levelt report’s structural diagnosis is quite general. The conditions it identified — reward for impressive results, weak verification mechanisms, social hierarchy that discourages scrutiny — are present in essentially every major research institution. This is why the response to major fraud cases, which typically involves expressing shock and reinforcing ethics training, has not reduced fraud rates. Ethics training cannot solve a structural problem. Individuals facing rational incentives to commit fraud will not be dissuaded by a seminar reminding them that fraud is wrong.

What would dissuade them — what the structural analysis consistently points to — is changing the incentive system itself. This means valuing replication, null results, and data-sharing in hiring and promotion decisions. It means funding agencies requiring pre-registration of studies, so that the analysis plan cannot be adjusted after seeing the data. It means journals publishing trials and studies that did not produce the hoped-for results. It means treating a clean negative finding as a genuine contribution rather than a professional embarrassment.

None of this is technically complicated. All of it is politically difficult, because the people with the most power to change academic incentive structures are the senior researchers who succeeded in the current system and who, if only unconsciously, have an interest in not dramatically revaluing the currency in which they accumulated their reputation.

The replication crisis that surfaced publicly around 2015, when coordinated attempts to replicate findings in psychology, medicine, and other fields found that a disturbing fraction of published results could not be reproduced, was not a discovery of widespread fraud. Most of the unreproducible findings were not fabricated. They were produced by the ordinary operation of a system that rewards the generation of impressive results without adequately rewarding the verification that those results are real. The distinction matters, because it points the diagnosis in the right direction. The crisis was not about bad actors in a good system. It was about a system producing predictable bad outcomes through the normal behavior of normal people responding rationally to the incentives they face.

The current landscape, a decade after the replication crisis became a public conversation, is better in some respects and unchanged in others. Pre-registration is more common. Open data requirements have spread. A generation of researchers, trained in the shadow of the crisis, is more alert to analytical choices that shade toward questionable practices. The fundamental incentive structure — publish novel positive results, get hired — has not changed, because it is embedded in the funding structures, journal hierarchies, and hiring processes that no single institution can reform unilaterally.

Science, as an institution, works because it has self-correcting mechanisms: replication, peer review, public data, adversarial scrutiny. These mechanisms function slowly and imperfectly, but they function. Over long enough timescales, wrong findings get corrected, fraudulent research gets retracted, and the body of scientific knowledge moves toward truth. The problem is the timescale. The average time between publication of a fraudulent or unreproducible result and its correction is measured in years. During those years, the finding circulates, gets cited, influences other research, and in medical contexts sometimes influences clinical practice.

Summerlin’s felt-tip pen was a symptom. The felt-tip pen gets confiscated, the researcher gets fired, and the system continues producing the conditions in which the next researcher picks up a felt-tip pen and makes the same calculation. The only way to stop producing that calculation is to change what the calculation rewards.

This turns out to be very hard to do, which is why, fifty years after Summerlin, the conversation about academic fraud still sounds remarkably similar to the conversation the Sloan Kettering investigation produced in 1974.