Photo: Unsplash
The Healthcare AI That Actually Shipped
In 2018, a group of researchers at Boston Children’s Hospital published a paper in Nature demonstrating that a deep learning model could diagnose pediatric disease from clinical notes with accuracy comparable to specialist physicians. The paper was widely cited, attracted significant media coverage, and became a standard reference in discussions of AI’s healthcare potential. The technology was impressive and real.
By 2026, applications of the specific techniques in that paper had been deployed at scale in exactly the populations that could benefit most — low-income countries with severe specialist physician shortages — in fewer than a dozen countries. The gap between “impressive research result” and “deployed system saving lives” is measured in organizational effort, not algorithms, and that gap has defined the healthcare AI story in the developing world for the past decade.
The healthcare AI that has actually shipped in sub-Saharan Africa and South Asia tends to share a set of design characteristics that are not obvious from the literature on AI capability. These systems prioritize reliability over accuracy; they are designed to work when the internet connection is spotty, when the device is three years old and has 15 percent battery, and when the user has not been trained beyond a brief orientation. They prioritize narrow scope over broad capability; they do one thing very well rather than many things adequately. And they are embedded in existing health system infrastructure rather than positioned as replacements for it.
Muso’s Community Health Worker AI, deployed in Mali and Guinea, guides community health workers through structured assessments of childhood illness using a decision algorithm embedded in a mobile app. The AI component is a question-routing system and a risk stratification model, not a general-purpose diagnostic system. It tells a community health worker whether a child is at high, medium, or low risk for severe illness and what to do in each case. The accuracy is high for the specific task because the specific task is narrow, well-defined, and supported by decades of clinical validation for the underlying assessment protocols. It does not attempt to diagnose — it amplifies the decision-making of frontline workers who are not physicians.
Jacaranda Health’s AI for safe motherhood, deployed across maternity facilities in Kenya and Tanzania, operates similarly. The system identifies pregnant women at elevated risk for pregnancy complications using routinely collected clinical data — blood pressure readings, fundal height measurements, hemoglobin levels, prior pregnancy history — and generates alerts for health workers that a woman needs additional attention. The model was trained on data from the specific facilities where it was deployed, which is unusual for AI products but essential for capturing the patient population characteristics and clinical norms of those specific settings.
The training-on-local-data requirement is the piece that most AI product development processes skip, and it is the piece that most explains why healthcare AI pilots in developing-country settings fail to scale. A model trained on patient populations from US academic medical centers will perform differently when deployed in settings where malnutrition is common, where women present later in pregnancy, where clinical documentation practices differ, and where the disease burden includes conditions that are rare in high-income countries. The model is not wrong in a general sense. It is calibrated to a different population, and miscalibration in medical AI is not a technical nuisance — it is a patient safety issue.
The organizations that have successfully deployed healthcare AI at scale in low-income settings have invested in what some practitioners call “adaptation infrastructure”: the processes and organizational capacity to collect local data, retrain or fine-tune models on that data, evaluate performance in local conditions, and iterate on model behavior based on observed outcomes. This infrastructure is unglamorous. It is not the part of AI development that attracts research publications or investor attention. It is the part that determines whether a product works.
Ubenwa provides the most technically distinctive case. The company’s AI analyzes the acoustic characteristics of a newborn’s cry to identify signs of neonatal distress — specifically, the acoustic markers associated with neonatal encephalopathy (brain injury around the time of birth) that trained neonatologists can identify through clinical experience. The technology runs on-device, on any Android smartphone, without internet connectivity. The inference model is small enough to run in real time on a low-end processor, producing a risk score within seconds of recording.
The deployment context is critical for understanding why this works. The alternative to Ubenwa’s AI in most of the settings where it has been deployed is not a trained neonatologist. It is a nurse-midwife who may or may not have received training in neonatal assessment, and who is managing multiple deliveries simultaneously without specialist backup. The AI is not replacing specialist judgment. It is providing a structured second opinion to a generalist who would otherwise have to rely entirely on their own training and experience.
The impact evaluation literature for tools like Ubenwa is limited by the difficulty of running randomized controlled trials in healthcare settings where withholding a potentially beneficial intervention creates ethical problems. The observational evidence — comparing outcomes in facilities using the tool versus comparable facilities not using it — is positive but subject to selection bias. This is a pervasive challenge for healthcare AI evaluation in low-income settings, where the rigor that would be applied to a new pharmaceutical is difficult to achieve and the resources to fund it are even more limited.
The ophthalmology AI story is perhaps the clearest case of successful technology transfer in healthcare AI. Diabetic retinopathy screening — identifying the characteristic blood vessel changes that indicate diabetes-related eye disease — is a task where deep learning has performed at specialist level since 2016. The technology has been adopted in settings from the UK’s NHS to Singapore’s national screening program to several Indian state health departments, with consistent results: the AI matches or approaches specialist accuracy for the specific task of screening.
In India, where 77 million people have diabetes (the second-largest diabetic population in the world) and where ophthalmologist access outside major cities is severely limited, AI retinopathy screening has the potential to catch disease at a treatable stage in populations that would otherwise not be screened until significant vision loss had occurred. The technical capability is proven. The deployment at scale requires integrating screening into primary care workflows, training primary care workers to use the tool, connecting screening results to referral pathways that actually exist, and funding the service within a health system with severe budget constraints.
All of those requirements are organizational, not technical. The technology has been ready for years. The organizational infrastructure to deploy it at scale in underserved Indian populations is being built slowly, unevenly, and with insufficient funding. That gap is not a failure of AI. It is a failure of the health system investment that AI deployment requires.
The common lesson across the healthcare AI deployments that have actually worked is one that the technology optimism of AI coverage consistently obscures: the technology is necessary but nowhere near sufficient. What the successful cases have done — Muso, Jacaranda, Ubenwa, the Indian diabetic retinopathy programs — is build the organizational infrastructure around the AI that makes the AI useful. That infrastructure includes training for frontline workers, data quality management, referral pathways for positive cases, monitoring and evaluation systems, and the organizational learning capacity to improve the tool based on operational experience.
Building that infrastructure in low-income healthcare settings is harder than building the AI. It requires deep partnerships with health systems that are already under-resourced, long implementation timelines, and the willingness to invest in organizational capacity that does not show up in product demos.
The healthcare AI that has actually shipped is quieter, narrower, and more embedded in existing systems than the AI that attracts attention. It is also, for the people it serves, significantly more valuable.



