Photo: Unsplash
The NHS's AI Bet and What It Will Actually Cost
The NHS AI Lab was established in 2019 with a mandate to safely develop and deploy artificial intelligence across the health service. By 2026, it had produced genuinely impressive results in narrow domains. The NHS AI-ECG Programme has deployed AI-assisted electrocardiogram analysis at over 1,200 sites, identifying atrial fibrillation with accuracy that matches specialist cardiologists. The National Pathology Imaging Co-operative’s AI tools are supporting cancer screening at over 70 trusts. The AI-powered triage tool at 111 (the non-emergency health line) handles more than 25% of calls without escalating to a clinical advisor.
This is real progress. The technology works in the domains where it’s been tested, under the conditions of the pilots. The translation from pilot success to sustainable national deployment is where the assumptions start to strain.
The radiology example
AI in radiology is probably the most mature clinical AI application globally. The algorithms for detecting abnormalities in chest X-rays, mammograms, and CT scans have been validated extensively, are CE-marked and FDA-cleared for relevant applications, and genuinely reduce the time required to identify likely pathologies. The NHS has been deploying them at scale since 2022, with more than £140 million of capital funding directed through the Imaging Diagnostics Programme.
The productivity story is compelling at pilot scale. A radiologist using AI assistance can review a chest X-ray in less time, with the algorithm pre-flagging suspicious areas, so the radiologist’s attention goes first to the cases most likely to require intervention. In studies at specific trust sites, AI assistance has been shown to reduce reporting time per image by 25-40% without reducing detection accuracy.
The difficulty is that this efficiency gain does not translate directly into increased throughput at system level. Radiologists don’t work in isolation — they work within diagnostic pathways that include order management, IT infrastructure, reporting systems, and coordination with clinical teams. The bottleneck in NHS radiology is not purely the radiologist’s time per image. It is the whole pathway, including consultant availability for complex cases, the infrastructure for AI system integration with existing PACS (Picture Archiving and Communication Systems), and the training time required for radiologists to calibrate appropriate trust in AI outputs.
The imaging backlog in England as of early 2026 still exceeds 900,000 outstanding tests. AI assistance has not made a substantial dent in this backlog because the constraint is not the one AI solves. It would be as if you installed an automatic dishwasher in a restaurant that was struggling because it didn’t have enough chefs — the dishwasher makes real improvements in one step of the process, but the binding constraint is elsewhere.
The clinical coding problem
Clinical coding — the translation of clinical records into standardized diagnostic and procedure codes for billing, planning, and research — is genuinely, relentlessly tedious and has long been a candidate for automation. The NHS spends approximately £250 million annually on clinical coding staff. AI-assisted coding has been piloted since 2022 and demonstrates real accuracy improvements over manual coding for straightforward cases.
The economics become complicated for complex cases. A patient with multiple comorbidities, an unusual presentation, a complicated procedure, and a long length of stay produces a clinical record that requires coding judgment, contextual understanding of clinical intent, and awareness of coding conventions that vary slightly across specialties. AI systems perform adequately on simple cases (which represent roughly 60% of volume) and substantially less well on complex cases (which represent a disproportionate share of the financial value of coding, because complex cases attract higher tariff payments).
An NHS trust that deploys AI coding in production and reduces manual coding review time will reduce it for the simple cases. The complex cases — the ones where coding errors cost more — will still require experienced coders and more time. The resource savings are real but smaller than aggregate accuracy figures suggest.
More concerning: several trusts that moved to AI-assisted coding in 2024-25 found that coder skill levels among remaining staff declined as they spent more time reviewing AI outputs than producing original coding. The cognitive muscle required for difficult coding cases, which takes years to develop, atrophied in environments where AI handled the easy cases and human review was framed as validation rather than primary judgment. The systems that need experienced coders most (for complex cases) were producing fewer experienced coders.
The 111 triage system and escalation asymmetry
The NHS 111 service, which handles urgent but non-emergency health queries, has been using AI triage assistance since 2023. The system uses symptom-checking algorithms to categorize call urgency and route calls — some to clinical advisors, some to a callback queue, some to self-care advice. The call handling efficiency has improved measurably.
The clinical risk profile of AI triage systems is asymmetric. The catastrophic failure mode is under-triage: directing a patient with a serious condition to self-care advice when they need immediate intervention. This is statistically rare but clinically disastrous. The NHS AI triage system has been calibrated conservatively — it over-escalates to human clinical advisors relative to a well-calibrated human performance benchmark — specifically to reduce the probability of under-triage. This is the right design choice clinically. It reduces the efficiency gain substantially. The system handles more calls but escalates more of them to clinical staff than a perfectly calibrated system would, because the cost of under-triage (preventable serious harm) far exceeds the cost of over-triage (unnecessary clinical advisor time).
This is how all high-stakes AI triage should be designed. It is also why the efficiency gains in high-stakes clinical AI are typically lower than the headline figures from capability benchmarks suggest. Capability benchmarks measure what the system can do under test conditions. Clinical deployment requires calibration for the failure modes that matter most, which reduces throughput.
What the NHS AI strategy is actually betting on
The NHS’s implicit wager is that AI can close the gap between the service’s capacity and the demand placed on it by an aging, increasingly comorbid population, without the politically difficult alternative of substantially increasing clinical staffing or restructuring what the service does. This is an understandable political position. It is not clearly a correct analytical one.
The evidence from deployed AI applications suggests that AI in healthcare delivers real but bounded efficiency gains in specific, well-defined tasks. It does not substitute for clinical judgment in complex cases. It does not remove the need for experienced staff; it changes what experienced staff do. It generates new implementation and maintenance costs that don’t appear in pilot budgets. And it frequently improves performance metrics on the metric it’s optimizing for while leaving unchanged (or occasionally worsening) performance on adjacent metrics that weren’t in scope.
The NHS will deploy more AI through 2027 and beyond, and some of it will help meaningfully. The trap to avoid is treating AI as the solution to a demand-capacity problem that requires either more capacity or reduced demand — a political choice that no algorithm can make for you.