Peer Review in the Age of Infinite Output

Photo: Unsplash

Scientific Publishing

Peer Review in the Age of Infinite Output

The system built to evaluate twenty papers a year is being asked to evaluate two hundred — and the failure modes are not random
peer-reviewscientific-publishingai-researchresearch-integrityacademic-publishing

Academic peer review was designed for a world of scarcity. The system that emerged from the 17th-century model of the Philosophical Transactions — send your findings to qualified colleagues, receive their criticism, revise and publish — assumed that the rate of manuscript production would be roughly proportional to the size of the scientific community. For three centuries, this was approximately true. The number of scientists grew; the number of papers grew; the size of editorial boards and reviewer pools grew with them.

2023 broke this assumption.

The introduction of large language model assistance into scientific writing, combined with AI tools for data analysis and hypothesis generation, produced what several editors-in-chief have described privately as a “tsunami.” PLOS ONE, which publishes across all scientific fields and is one of the largest journals by volume, reported a 74% increase in submissions between 2022 and 2025. Nature Communications saw similar patterns. The arXiv preprint server, which operates without peer review, saw its daily submission rate roughly double over the same period.

The papers are not uniformly worse than before. That framing misses what’s actually happening. Some AI-assisted papers are excellent — the computational tools genuinely help researchers be more rigorous, explore more systematically, and write more clearly. The problem is that peer review cannot easily distinguish these from papers where AI assistance was used to paper over thin contributions, manufacture plausible-sounding analyses, or generate vast quantities of results without commensurate scientific judgment.

The Reviewer Exhaustion Problem

Ask any working scientist about peer review and they will describe the same experience: more requests, less time, declining incentives. Reviewing a paper is unpaid, typically uncredited, and time-consuming. The norms of reciprocity that sustain the system — “I review papers because I benefit from having my papers reviewed” — are eroding because the marginal reviewer has less and less spare time relative to the system’s demands on it.

A 2026 survey by the European Association of Science Editors found that 67% of researchers had declined at least one peer review invitation in the past year specifically because of workload, up from 41% in 2021. The journals that can attract reviewers are the ones with high prestige — Nature, Science, Cell — where reviewing confers enough reputational benefit to justify the time. Below that tier, the scarcity is acute. Journals are waiting months longer for reviews, accepting papers with fewer referee comments, and relying increasingly on editorial board members who are reviewing more than they should.

The result is not uniform degradation — it is selective degradation. Papers in fields with a small, tightly networked research community still get careful review. Papers in rapidly growing fields where the community has expanded faster than the social norms of peer review can establish — AI-related chemistry, computational biology, AI-mediated genomics — are getting less rigorous evaluation on average. This is precisely the domain where careful review matters most.

Desk Rejection as a Blunt Instrument

Journals have responded to volume primarily through desk rejection: the practice of an editor, without sending a paper to reviewers, deciding it falls below the bar for external review. At Nature, desk rejection rates reportedly exceed 80%. At good general science journals, rates of 50-70% are now common.

Desk rejection has real costs. The criteria are opaque and not always applied consistently. There is reasonable evidence that papers from less prestigious institutions, from researchers in lower-income countries, and from smaller research groups are disproportionately desk-rejected compared to papers from Harvard, MIT, and ETH Zurich that have similar methodological quality. This was a problem before AI; AI-assisted submission volume has made it worse by forcing editors to make faster, more impressionistic judgments.

Several journals have now introduced AI tools to assist with editorial triage — models that flag statistical errors, check for data anomalies, identify potential image manipulation, and assess whether a submission meets basic methodological standards before human eyes look at it. This is sensible and probably net positive. It is also creating a new layer of AI judgment in a system already stressed by questions of AI judgment.

The Registered Reports Experiment

The most structurally sound response to the combined AI-volume problem and pre-existing replication-crisis problem is the registered reports format, in which a journal provisionally accepts a paper based on the theoretical rationale and proposed methodology before the experiments are run. Results are published regardless of outcome, which eliminates publication bias at the individual study level.

Registered reports are not new — they have been available at some journals since around 2013 — but their adoption has accelerated significantly since 2024. PLOS ONE, eLife, and a cluster of psychology and neuroscience journals now actively promote the format. eLife made a more radical move in 2023 by adopting a model where accepted papers are published alongside reviewer comments, with no accept/reject decision — just a public record of the review. The editor’s view is that the binary decision was never scientifically meaningful; the review commentary is the actual value the system provides.

Both reforms are partial responses. Registered reports are labor-intensive and work best for confirmatory research (testing a specific hypothesis) rather than exploratory research (figuring out what’s in a dataset). They don’t address the reviewer exhaustion problem. They also don’t address the specific AI challenge of evaluating computational results whose validity depends on model architecture decisions, training data choices, and hyperparameter settings that reviewers may lack the expertise to assess.

What AI Review Would Actually Require

The proposal that AI systems should review AI-generated papers is increasingly serious within the community, though “serious” here covers an enormous range of ideas.

At the minimal end: automated statistical review. Systems that check whether the reported statistics are consistent with the stated sample size, whether the confidence intervals are plausible, whether the effect sizes fall within historically reasonable ranges for the field. Tools like Statcheck (which has been running since 2016) and more sophisticated successors do this and catch real errors.

At the more ambitious end: automated scientific review — models that evaluate whether a paper’s conclusions are warranted by its evidence, whether important alternative explanations were considered, whether the methodology has known failure modes for this type of research. This is much harder. The criteria for good science are not fully formalizable, and the ways in which a paper can be systematically misleading without containing any explicit falsehood are precisely the cases where formal checks fail.

A pilot program at three journals in materials science, announced in early 2026, is having AI models flag papers for specific technical concerns — implausible crystal structures, suspect spectroscopic assignments, statistically improbable yield improvements — and routing flagged papers to specialists for those specific concerns. Early results suggest the automated flagging catches about 40% of later-identified methodological problems and introduces very few false positives for the specific technical categories it covers. This is one of the more plausible paths forward: narrow AI review of narrow technical questions, combined with human judgment about scientific significance.


Peer review is not a quality filter. It never was, not fully — it was always a social institution that worked because the people in it shared standards, knew each other, and had reputational skin in the game. AI has not broken peer review. It has accelerated the erosion of the conditions that made peer review work. The question is not whether to add AI to the solution. It is whether the underlying institution is worth preserving, and if so, what it would take to rebuild its foundations rather than just reinforce its walls.