What Academic Credentials Are Worth When AI Can Pass Any Exam

Photo: Unsplash

Education

What Academic Credentials Are Worth When AI Can Pass Any Exam

The MCAT, the CPA exam, the bar, the GRE — AI passes them all. The credential system hasn't adjusted to that fact.
educationcredentialsaiprofessional-examslabor-market

The history of professional credentialing in the United States is the history of guilds figuring out how to exclude competition. This is not a conspiracy theory. It is the documented history. The American Medical Association systematically closed competing medical schools in the early twentieth century using the Flexner Report as cover — the report identified real quality problems but conveniently also eliminated the schools that trained osteopaths, homeopaths, and a disproportionate number of Black physicians. The bar exam expanded in scope and difficulty through most of the twentieth century during periods when lawyer oversupply threatened incumbent practitioners. Professional licensing is, first and always, a supply-restriction mechanism.

None of this means the credentials don’t also certify something real. The MD from a medical school that survived the Flexner reforms does indicate substantial knowledge and supervised clinical practice. The bar-admitted attorney has genuinely demonstrated something about their grasp of legal reasoning. The CPA has passed tests of real complexity. The credentials are not pure guild protection — they are a mixture of genuine competency signal and supply restriction, with the proportions varying by field.

What AI has done is sever the two components. The competency signal component of the credential is increasingly separable from the human who holds the credential. The supply restriction component remains intact because it’s enforced by law. This creates a strange situation: the legal protection around the credential is stronger than ever, because it’s backed by bar associations and medical boards and CPA licensing boards with real enforcement power, while the underlying signal value — this person actually knows what we’re saying they know — is weaker than ever.

The MCAT, which medical schools use as a primary admissions screen, tests knowledge of biology, chemistry, physics, psychology, and verbal reasoning. The average score needed for admission to a top-25 medical school is around 517 out of 528. Claude 3.5 Sonnet scored 521 in a systematically administered test conducted by Stanford researchers in September 2024. GPT-4o scored 519. These are not approximations. They are not cherry-picked results. They are consistent across repeated administration with different question sets.

What does this mean? The MCAT was designed to predict performance in medical school and ultimately competence as a physician. Its predictive validity for those outcomes is real but modest — it explains somewhere between 15 and 25 percent of the variance in first-year medical school performance, depending on the study. It was never a perfect predictor. It was always primarily a filter for managing demand (approximately 50,000 applicants for approximately 20,000 medical school seats per year).

If AI can score 521 on the MCAT, the MCAT is not primarily measuring what AI cannot do. It is measuring knowledge retrieval and reasoning of the type that AI handles extremely well. A prospective medical student who uses AI assistance to prepare for the MCAT will have an easier time. A medical school that uses MCAT score as the primary admissions screen is selecting for students who have learned to retrieve and apply certain types of knowledge — but so can a language model, without any of the other qualities you want in a physician: judgment under uncertainty, empathy, manual skill, composure under pressure, the ability to build trust with patients who are frightened.

The legal profession is the most interesting test case because the AI capability disruption in law is most advanced and the institutional resistance is most visible. AI systems can draft contracts, research case law, identify relevant precedents, generate first drafts of motions, and review documents for discovery — tasks that occupied a significant portion of junior associate time and a non-trivial portion of mid-level associate time at law firms. The firms that adopted these tools in 2024 and 2025 did not hire proportionally fewer associates, partly because demand for legal services also increased and partly because cultural inertia in law firm hiring is substantial.

But the trend is clear. Cravath, Swaine & Moore reduced its 2026 first-year associate class by 18 percent compared to 2023. Simpson Thacher reduced by 22 percent. The firms are being careful not to say this is AI-related, because that invites regulatory attention and ABA scrutiny. The official explanations involve market conditions, client demand patterns, and strategic recalibration of associate-to-partner ratios. Everyone in the industry knows what is actually happening.

The bar exam, meanwhile, has been debated internally within the legal profession for a decade. The critique that it tests the wrong things — that it rewards memorization over judgment, that it’s a better predictor of bar prep course completion than of legal competence — predates AI. AI has made the critique more urgent. If AI can pass the bar at the 95th percentile, then bar passage certifies that the human who passed has knowledge and reasoning of a type AI also possesses. It does not certify what a practicing attorney needs to be good at their job.

The CPA exam may be the most extreme case. Accounting, more than almost any other professional domain, is a field where the work is structured, rule-governed, and can be described precisely enough that AI handles it very well. The work of a public accountant involves applying rules to financial data, identifying inconsistencies, ensuring compliance with standards, and preparing accurate representations of financial position. AI is genuinely excellent at all of these tasks. The CPA exam tests knowledge of these rules and the ability to apply them.

The Big Four accounting firms — Deloitte, PwC, EY, KPMG — have all invested hundreds of millions in AI-assisted audit and tax tools since 2023. Their entry-level hiring is down. Their revenue per employee is up. They have not publicly connected these facts to AI adoption in their disclosures, but the connection is not hard to trace. The CPA credential in 2027 is still required for signing audit opinions. The work underlying that signature is increasingly AI-generated, reviewed, and approved by human CPAs who are doing progressively less of the original analytical work.

This creates a strange two-tier system: a licensed credential class whose credential certifies knowledge that is now largely AI-replicable, performing supervisory and approval functions over AI systems that do the core technical work, and billing clients at rates that reflect the credentialed human’s involvement regardless of how much of the actual work was done by AI. Clients who understand this are starting to push back on billing rates. Most clients don’t understand it yet.

The graduate school admissions process is where the credential value compression is most visible at the individual level, because the GRE and GMAT are among the most straightforward tests of AI capability. The Educational Testing Service, which administers the GRE, suspended score submission from test-takers using remote proctoring in 2025 after discovering that a non-trivial fraction of test-takers were using AI assistance through means that proctoring software couldn’t reliably detect. In-person testing only resumed shortly thereafter. The workaround added cost and friction. It did not restore confidence.

Graduate school admissions offices responded in one of two ways. The ones at research universities with significant funding for experimentation redesigned their admissions process around research statements, writing samples produced under supervised conditions, and — where possible — video interviews with faculty. The ones at professional programs, under pressure to maintain application volume, mostly kept the standardized test requirement while adding an asterisk about “holistic review.” Holistic review, in practice, means the credential still matters and the interview matters somewhat, and the test scores become a floor rather than a differentiator.

Stanford’s graduate school of education formally dropped GRE requirements in 2025 and replaced them with a structured portfolio submission and a faculty-reviewed research statement. Their application volume dropped 14 percent in the first year, which the admissions office framed as “right-sizing toward a more qualified applicant pool.” The applicants who self-selected out were disproportionately the ones who had been applying broadly on the basis of credential scores rather than genuine fit with specific faculty research programs. That’s probably a good outcome, but it required accepting a short-term enrollment decline that most programs with revenue pressure cannot tolerate.

The deeper question that professional credentialing bodies are avoiding is: what should a credential certify in a world where AI is a permanent part of professional practice? The old answer — you can do the work without AI assistance — is increasingly anachronistic. The new answer — you can oversee and judge AI-assisted work appropriately — requires a completely different assessment design. Nobody is building that assessment design at scale, because it requires agreement about what “appropriate oversight” looks like, and that agreement requires facing directly the question of how much AI should be doing in professional practice, which is politically and commercially contentious.

The credential in 2027 is in a peculiar state. It is legally and institutionally more powerful than it has ever been, because the enforcement mechanisms are mature and the supply restriction is effective. It is empirically less meaningful than it has ever been, because the knowledge component it certifies is increasingly AI-replicable. This combination produces a system where the credential is a prerequisite to enter the market, the market involves substantial AI assistance in doing the credentialed work, and the credential itself says nothing about how well the holder can navigate the AI-augmented version of the profession.

That gap will close eventually. It will close badly or it will close well, and which outcome we get depends on whether the credentialing bodies choose to lead the redesign or wait until a sufficiently damaging failure forces it upon them. History suggests they will wait. History usually finds a way to surprise you.