Tax Assessment and the End of the Audit Lottery

Photo: Unsplash

AI in Tax Administration

Tax Assessment and the End of the Audit Lottery

For a century, tax enforcement was a lottery. AI-assisted revenue authorities are ending the randomness — and replacing it with something that requires careful thought.
tax-policyai-governmentrevenue-administrationalgorithmic-governancedigital-state

The income tax audit has always been, from the taxpayer’s perspective, a lottery. Not random in the pure sense — the IRS has always weighted its examination resources toward returns with unusual features, high incomes, and suspicious deductions — but random enough that most non-compliant returns were never examined, and many compliant ones were. The US tax gap (the difference between taxes owed and taxes paid) has hovered around $500-600 billion annually for decades. Most of it was never collected not because the IRS didn’t know it existed, but because examining it cost more than the expected recovery.

AI is in the process of ending this lottery, and the implications are substantial in ways that policy discussions have not fully engaged with.

What the technology actually does

Tax authority AI systems, deployed now across the US, UK, Australia, Canada, Denmark, and most EU member states, perform several functions that were either impossible or impractical at scale before machine learning.

The most basic is anomaly detection: identifying returns that deviate from expected patterns in ways a human reviewer would have to examine thousands of documents to notice. If a dental practice in suburban Ohio consistently reports revenue 35% below the median for comparable practices in comparable markets, that pattern is trivial for an algorithm to flag and time-consuming for a human to identify within a universe of 160 million filed returns. The IRS’s Advanced Analytics and Machine Learning program (which entered broad deployment in 2023) runs every filed return through pattern-matching models trained on the 25 years of resolved audit data available to the agency.

More sophisticated is cross-institutional data matching. Denmark’s SKAT has since 2021 automatically cross-referenced every filed income return against data from banks, employers, pension providers, real estate registries, and share registers — not as an audit trigger but as a pre-population service, so that most Danish taxpayers’ returns are filled in automatically and they are asked only to confirm or amend. The non-compliance detection function is built into the pre-population: if what the employer reports doesn’t match what the return says, the discrepancy is flagged before filing, at essentially zero enforcement cost.

The UK HMRC’s Connect system has been running since 2010 and matches tax records against data from 30+ third-party sources including Companies House, Land Registry, DVLA, eBay, Airbnb, and social media. By 2025, Connect was processing data on over 2 billion “relationships” between taxpayers, financial accounts, and transactions, identifying leads that generated £3.2 billion in additional tax yield in the 2024-25 fiscal year.

The due process questions no one is asking

The efficiency case for all of this is real. If AI-assisted enforcement reduces the tax gap by 20%, that’s $100-120 billion in annual additional revenue in the US context alone — enough to fund a substantial fraction of the federal discretionary budget without changing tax rates. The argument for using better tools to enforce existing law seems straightforward.

The legal and due process questions are considerably less tidy.

Tax audit selection has traditionally been treated as a ministerial decision not subject to the same procedural requirements as adjudicative decisions. The IRS selects returns for examination; it doesn’t need to justify that selection to the taxpayer before the examination begins. But when selection is driven by an algorithm that may itself encode biases from historical audit data, the selection decision acquires a different character.

If the IRS trained its audit selection model on 25 years of examination results, and 25 years of examination priorities were shaped by both legitimate risk assessment and discriminatory patterns (historically, self-employed immigrants and small cash-based businesses were disproportionately examined partly because of legitimate compliance risk and partly because of targeting patterns that had ethnic or demographic correlates), then the model may reproduce those patterns without anyone choosing to do so. The training data contains the bias. The model learns it. The selections look statistically justified because they are statistically grounded in history.

This has already produced litigation. In 2024, a consortium of small business organizations challenged the IRS’s use of predictive analytics in audit selection, arguing that the model’s deployment had produced selection rates for certain immigrant-owned businesses that were statistically indistinguishable from targeted ethnic enforcement. The district court dismissed the case on standing grounds, but the underlying evidentiary question — whether the model is encoding and amplifying historical discriminatory patterns — was never addressed.

The Dutch tax authority’s spectacular failure

The most consequential AI tax enforcement failure to date happened in the Netherlands. Between 2013 and 2019, the Dutch Tax and Customs Administration (Belastingdienst) ran an algorithmic fraud detection system that cross-referenced childcare benefit claims against a risk model. The model flagged applications containing certain patterns — including having a dual nationality, using a benefits assistant, and living in specific postal codes — as high fraud risk.

The system generated tens of thousands of wrongful fraud determinations. Families (disproportionately those with migration backgrounds) were required to repay years of childcare benefits they had legitimately received, with repayment demands reaching €30,000 or more in some cases. The repayment demands triggered financial cascades: debt, evictions, family separations, divorces. A parliamentary inquiry in 2020 concluded that the state had committed “unprecedented injustice” against approximately 26,000 families. Multiple government ministers resigned. Prime Minister Mark Rutte’s first cabinet fell partly as a consequence.

What makes the Dutch case analytically important is not that an algorithm was deployed or that it was wrong. It’s that the wrongness was systematic and demographic in its distribution, that the administrative culture of the tax authority treated algorithm outputs as authoritative in ways that shut down the normal channels of complaint and correction, and that the harm accumulated for six years before the institutional response materialized.

The algorithm wasn’t working in isolation. It was working within a bureaucratic culture that had decided the machine was right and that persistent complaints from affected citizens were themselves evidence of bad faith rather than evidence of system error.

The enforcement paradox

There is an uncomfortable equity tension at the heart of AI tax enforcement. The populations most likely to benefit from reduced audit lottery are middle-class wage earners whose income is fully reported by employers and whose returns are straightforwardly correct — they will be audited less because AI can see they’re compliant. The populations that will face more scrutiny are those with complex, opaque, or hard-to-verify income — which includes both wealthy self-employed individuals (a legitimate enforcement target) and low-income individuals with multiple irregular income sources (who have historically been over-audited relative to the revenue at stake).

The tax gap is not uniformly distributed. Roughly 70% of it, by IRS estimates, sits in business income — sole proprietors, S-corporations, partnerships — where reporting is complex and verification is hard. Effective AI enforcement would concentrate audit activity at the high end of that income distribution, where the expected revenue per examination is highest. Whether the systems are actually doing this, or whether they are doing something different, is not publicly disclosed at a level that allows independent verification.

The end of the audit lottery is, on balance, probably good for tax compliance and tax revenue. Whether it’s good for due process and equity depends almost entirely on choices that have not been made publicly, about what the models are optimizing for, whose historical patterns are embedded in the training data, and what oversight mechanisms exist when the algorithm is wrong. These are solvable problems. They have not been solved.