Education

The AI Tutoring Market: What Two Years Actually Showed

Khanmigo, Synthesis, and a dozen well-funded startups promised to revolutionize learning. Here's what the outcomes data says.

By Jakub Jirák May 2, 2027 7 min read

educationai-tutoringedtechlearningstartups

In 2024, Sal Khan went on 60 Minutes and told Lesley Stahl that Khanmigo would be “a personal tutor for every child on earth.” The clip got 4 million views. The Gates Foundation wrote a check. The Department of Education cited it in a press release. It was the best possible moment for a certain kind of technological optimism — the kind that believes an elegant demo is sufficient evidence that something works at scale in the real world.

Two years later, Khanmigo has 2.3 million active monthly users in the United States. That’s not nothing. But Khan Academy had 18 million active monthly users before Khanmigo launched. The AI layer did not multiply engagement. In most cohorts, it didn’t substantially improve outcomes. What it did was change the nature of the interaction in ways that were harder to measure and easier to spin.

The honest assessment of the AI tutoring market in mid-2027 requires holding two things in tension simultaneously. The technology is genuinely impressive. The learning outcomes are genuinely disappointing — not catastrophically, not in every case, but systematically below what the investors and advocates claimed. Understanding why requires looking at what tutoring actually is, not what it sounds like it should be.

Effective human tutoring, in the research literature going back to Benjamin Bloom’s 1984 “2-Sigma Problem” paper, works through a combination of immediate feedback, diagnosis of specific misconceptions, and emotional attunement to the learner’s state. Bloom found that one-on-one tutoring produced learning outcomes two standard deviations better than conventional classroom instruction. Two sigma. This is a massive effect. It’s also why every AI tutoring company put “Bloom’s 2-Sigma Problem” in slide three of their pitch deck.

The problem is that Bloom’s effect depended heavily on the third element — emotional attunement. A skilled human tutor knows when a student is frustrated versus confused versus bored. They know when to push and when to back off. They know when the stated question isn’t the real question. They know, through years of experience with hundreds of students, what the confused face looks like and what it usually means. AI tutoring systems in 2024-2025 were very good at the first two elements and genuinely bad at the third.

Synthesis, the tutoring company that spun out of Elon Musk’s Ad Astra school project, raised $80 million in 2024 and grew to 400,000 paying families by early 2025. Their product is genuinely excellent for a specific kind of learner: intrinsically motivated, cognitively advanced, comfortable engaging with an AI interface, and not experiencing significant emotional barriers to learning. For that child, Synthesis is remarkable. For a struggling 9-year-old in Albuquerque who’s behind in math because of disrupted schooling, two working parents, and a home environment that makes concentration difficult — Synthesis is not the solution. The product addresses the needs of the market that was already best served.

The randomized controlled trials started appearing in early 2026 and the findings were consistent enough to notice a pattern. A meta-analysis published in the Journal of Educational Psychology in March 2026, covering 34 studies of AI tutoring interventions between 2024 and 2026, found an average effect size of 0.18 standard deviations. Positive, statistically significant, but a tenth of what Bloom documented with human tutors. A fraction of what investors were told to expect.

The variance mattered as much as the average. Some interventions — particularly those focused on procedural math skills with immediate right/wrong feedback and clear worked examples — showed effect sizes of 0.4 to 0.6. These are genuinely meaningful. Drilling multiplication tables, practicing equation solving, getting instant feedback on algebra steps: AI tutoring is excellent at this. Other interventions — reading comprehension, essay feedback, conceptual understanding in history and science — showed effect sizes clustered around zero.

This was, in retrospect, exactly what you should have predicted from first principles. AI is good at tasks with clear right answers and poor at tasks requiring interpretation of meaning, context, and intent. The tutoring market oversold the second category because the demo looks impressive. Ask an AI to give essay feedback and it will produce articulate, multi-paragraph feedback in seconds. Whether that feedback helps the student write better essays is a different question, and the answer turns out to be: somewhat, inconsistently, and far less than a skilled English teacher reading the same essay.

The market dynamics have been brutal for the companies that bet on the wrong half of this. Numerade, which raised $26 million in 2021 and pivoted aggressively into AI tutoring in 2024, filed for Chapter 7 in November 2026. Chegg — which for a decade was the defining homework-help company, with $700 million in annual revenue at its 2021 peak — had its revenue fall to $180 million in 2026 as students discovered that asking Claude directly was free, faster, and better. Chegg’s stock, which traded above $100 in 2021, is at $3.40 as of this writing.

The distinction that’s emerging in the surviving companies is between AI tutoring as a product and AI tutoring as an infrastructure layer. Duolingo understood this early. Rather than building an AI tutor, they used AI to personalize the sequencing of their existing content, adjust difficulty dynamically, and generate practice exercises at scale. Their engagement metrics improved 23 percent year-over-year. The learning outcome data is better than average for the sector, probably because they started with content that was already known to work and used AI to deliver it more efficiently.

Carnegie Learning, the company that has been building AI-assisted math curriculum since the 1990s — serious cognitive science, not hype — is quietly having its best years ever. Their MATHia product, which embeds AI tutoring in a curriculum framework based on 25 years of learning science research, shows consistent effect sizes in the 0.3-0.5 range. They’re not a startup. They don’t do press tours. They don’t have a charismatic founder who does 60 Minutes interviews. They’re doing the thing that works.

The teacher displacement question is where the AI tutoring discourse goes most wrong. The 2024-2025 wave of edtech enthusiasm included a lot of implicit and sometimes explicit claims that AI tutors would reduce the need for human teachers. This has not happened. Teacher employment in K-12 education actually increased 2.1 percent between 2024 and 2026, driven partly by states backfilling pandemic-era staffing shortfalls and partly by political resistance to any policy that could be framed as “replacing teachers with robots.”

What has changed is how teachers spend their time in classrooms that use AI tutoring tools. The research here is actually encouraging. A 2026 study of 140 classrooms in Chicago Public Schools that adopted AI tutoring tools found that teachers in those classrooms spent an average of 34 more minutes per week in one-on-one conversation with students, compared to control classrooms. The AI handled the routine drill and practice that had previously occupied significant class time. The teachers used that time for the things only humans can do: mentorship, motivation, diagnosis of the social and emotional issues that affect learning, and the kind of real relationship that makes a student decide that math matters to them.

This is the version of AI tutoring that works. Not AI replacing the teacher. AI handling the automatable portions of instruction so the teacher can focus on the irreplaceable portions. Every single interview I’ve done with teachers who are using AI tools well in 2027 describes some version of this pattern. The tools that treat teachers as the problem — the ones that want to go “direct to student” without a teacher in the loop — are mostly the ones that are failing.

The tutoring market itself, meaning private tutoring outside the school system, is more complicated. Wyzant and Tutor.com have seen volume decline as parents discover that Claude and GPT-4o can answer their child’s homework questions instantly and for free. But the premium end of the market — $150/hour human tutors for high-stakes test prep, college application essays, and genuine learning difficulties — is holding. The parents who can afford $150/hour know that their child doesn’t need someone to answer questions. They need someone who will notice that the child freezes up under pressure, or rushes through problems when anxious, or needs the concept explained three different ways using three different analogies before it clicks. That’s not a thing AI does well yet.

The SAT prep companies have done something clever. Kaplan and Princeton Review both integrated AI extensively into their practice systems — automated scoring, personalized drill sequencing, instant explanations — while positioning the human instruction as the premium differentiator. “AI does the practice, humans do the coaching” is their implicit pitch. It’s working. Their revenue is up despite the free alternatives proliferating.

What the AI tutoring market actually proved over two years is that the technology is a powerful tool in the hands of skilled educators and a mediocre product when deployed without them. The pitch decks promised the latter would be unnecessary. The outcomes data says otherwise. This is not a surprise to anyone who has spent time thinking carefully about what learning requires. It is apparently a surprise to quite a few venture capitalists who are now quietly updating their theses.

The AI Tutoring Market: What Two Years Actually Showed

The Navigation of Hours: Steering Through Deadlines, Currents, and Hidden Reefs

Claude Code: Turning a chaotic commit history into a tidy narrative

The Future of Data-Driven Healthcare and Wearable Electronics

Prompt engineering for code: Structured outputs: JSON schemas that keep agents honest

The Architecture of Hours: Designing Time That Withstands Pressure and Creates Beauty

The Economics of Colonial Tax Systems