AI Chemistry and the Weapons Problem Nobody Wants to Discuss Clearly

Photo: Unsplash

Dual-Use Research

AI Chemistry and the Weapons Problem Nobody Wants to Discuss Clearly

The same generative models that accelerate drug discovery and materials science lower the barriers to chemical weapons synthesis — and the response so far is inadequate
dual-use-researchchemical-weaponsai-safetychemistrybiosecurity

In March 2022, Fabio Urbina and colleagues at Collaboration Pharmaceuticals published a paper in Nature Machine Intelligence about a disturbing experiment. They had taken their generative chemistry model — designed to find drug candidates with optimal therapeutic properties — and reconfigured its reward function. Instead of optimizing for bioavailability and low toxicity, they asked it to optimize for toxicity and similarity to known chemical weapons agents. In six hours, the model generated 40,000 molecules. Many of them were known chemical weapons agents or close analogs. Some were structures the researchers had not encountered before.

The paper caused a stir. Then the stir faded.

By early 2027, the broader community of AI chemistry researchers has developed a set of practices around dual-use risks that are more serious than nothing and less serious than the threat warrants. The gap between the capabilities that exist and the governance structures that govern them is not a gap that market forces will close.

What the Models Can Actually Do

The honest technical picture is more constrained than the worst-case framings suggest, and more concerning than the reassuring rebuttals acknowledge.

Current generative chemistry models — diffusion models, transformer-based molecular generators, reinforcement learning systems trained on structure-activity relationships — are genuinely capable of proposing novel chemical structures with predicted properties. The predictions are not always right. The synthesis routes generated by retrosynthesis models range from accurate and insightful to chemically implausible. A model that generates a structure with predicted high toxicity and a predicted synthesis route does not guarantee that the structure can actually be made by a non-expert with available precursors.

But the bar for harm is not “fully autonomous chemical weapon synthesis.” The bar is “providing meaningful assistance to someone who already has some chemistry expertise and harmful intent.” This is a lower bar. A chemistry graduate student who understands synthesis but doesn’t know where to start on a specific class of compounds can use AI tools to substantially accelerate their search. The counterfactual is not “they could have found this in three minutes with a literature search” — for some classes of compounds, the AI contribution is real.

The specific area of greatest concern, among researchers who study this carefully, is not nerve agents or blister agents — those are extensively documented, and any competent chemist already has access to the relevant literature. The concern is novel analogs: structures that have the desired properties of known agents but differ enough in structure to evade standard detection methods and to potentially not appear on controlled substance schedules. Generative models are well-suited to this task in a way that literature search is not.

The Response So Far

The AI chemistry community has responded primarily through what might be called self-regulatory norm-setting: voluntary commitments not to train on weapons-related data, not to optimize for toxicity objectives without oversight, and to implement input/output filters that flag queries related to known chemical weapons classes.

These commitments are not nothing. The major commercial chemistry AI platforms — including Schrödinger, Insilico Medicine, and the research models from big pharma — have implemented filtering systems. Academic groups working in generative chemistry have generally adopted responsible disclosure norms. The biosecurity community, which has more developed infrastructure for this class of problem, has been actively engaging the chemistry AI community since 2023.

The limitations of self-regulation are structural. Academic models are published and their weights often shared. Models trained for legitimate purposes can be fine-tuned for other purposes by anyone with the weights. The filtration approaches are reactive — they can be tuned to evade known filters. And the global distribution of chemistry AI research means that norms established by leading labs in the US and UK have no binding force on researchers in jurisdictions with different priorities.

The Chemical Weapons Convention, which entered into force in 1997, does not have provisions for AI-assisted synthesis. The Australia Group, the informal multilateral arrangement that controls precursor chemicals, was designed for physical chemical supply chains. Neither framework was built for a world where the knowledge to synthesize a dangerous compound can be generated algorithmically from freely available model weights.

The Bioweapons Comparison

The biosecurity community has been grappling with dual-use AI problems longer, and their experience is instructive.

Genomic synthesis companies — the firms that physically synthesize DNA sequences on demand — have had screening systems since the early 2000s. The Harmonized Screening Protocol, developed by industry in consultation with biosecurity agencies, requires synthesis companies to screen orders against a database of dangerous sequences and refuse suspicious requests. The system is imperfect (it catches obvious cases and misses sophisticated evasion) but it creates a meaningful chokepoint because synthesis requires physical infrastructure.

Chemistry AI lacks an equivalent chokepoint. The model runs on a laptop. The synthesis, if it happens, requires laboratory access — which is some constraint — but the design phase, the phase where AI provides the most leverage, leaves no detectable trace. The gap between what biosecurity has managed to build and what chemical weapons governance has built is partly a matter of timing (biosecurity got started earlier) and partly a matter of different regulatory cultures. It suggests what might be possible with sustained effort and suitable urgency.

The Conversation That Isn’t Happening

What is most striking, to anyone who follows both the chemistry AI space and the policy space, is the degree of disconnection between them.

At chemistry AI conferences and workshops, the dual-use question gets raised seriously and discussed technically. People are genuinely engaged with the problem. The 2023 Urbina paper is cited, the limitations of current filtration approaches are acknowledged, and there are real debates about open-source release of powerful models, about data governance, about whether the scientific benefits of publication outweigh specific proliferation risks.

In policy circles, with notable exceptions, the conversation is much more superficial. “AI weapons” is a category that tends to collapse everything from autonomous weapons systems to bioweapons design to AI-assisted chemical synthesis into a single undifferentiated concern. The technical specificity that would allow sensible policy responses is mostly absent. The regulatory proposals that emerge — generalized AI safety frameworks, national security review processes for large models — are not designed with the specific threat profile of chemistry AI in mind.

Dual-use research of concern has been a live category in biology since at least the early 2000s debates about mousepox modification and the reconstruction of the 1918 influenza virus. The governance mechanisms are imperfect but real: institutional biosafety committees, federal oversight of specific research categories, national science advisory boards with genuine technical expertise. The equivalent for AI chemistry does not yet exist in any operationally meaningful form.


This is a problem where the cost of getting it wrong is catastrophic and the cost of getting it right is merely expensive. The field that created the problem is having the right conversations. The institutions that would need to act on them are not, yet, listening with enough specificity.