Photo: Unsplash
The Real Cost of Training an AI Model That Nobody Wants You to Know
When Sam Altman mentioned in passing that GPT-4 had cost more than $100 million to train, the figure became the defining data point in public discussions of AI economics. Journalists used it to explain why only a handful of companies could compete in frontier AI. Investors used it to justify enormous valuations — if a model costs $100 million to produce, surely the company that built it must be worth billions. Competitors used it as a benchmark against which to measure their own ambitions.
The figure is real. It is also, in the broader context of what running a major AI system actually costs, almost beside the point.
Training cost is the most visible and quotable number in AI economics because it happens once, in a defined window, with costs that can be traced to cloud compute invoices. The costs that follow training — inference at scale, ongoing fine-tuning, safety work, data labeling, the human infrastructure that makes AI systems usable — are harder to measure, less glamorous to discuss, and collectively larger than the training cost over any meaningful operating period. Understanding AI economics requires understanding all of it, not just the dramatic number at the beginning.
Begin with inference, because this is where AI business models succeed or fail. Training a model is a capital expenditure — you spend money once to produce an asset. Inference is an operational expenditure — you spend money every time someone uses the asset. For a consumer AI product with millions of daily active users, inference costs accumulate continuously and scale directly with usage.
The economics are unforgiving. A single GPT-4-class query — the kind of request that produces a substantive, multi-paragraph response — costs between one and five cents in compute, depending on length and the specific model configuration. This sounds trivial, but consider the math at scale. If a product with ten million daily active users generates an average of twenty queries per user per day, that is 200 million queries daily. At two cents each, that is $4 million per day in raw inference compute. $1.4 billion per year, before anything else.
This is why the economics of “free” AI tools are so peculiar and why the business models of major AI labs are so difficult to make work. OpenAI’s ChatGPT is partially free, with a premium tier. Anthropic’s Claude has a similar structure. The free tier exists to drive adoption, build brand, and collect usage data — but every query from a free user represents a real cost that must be subsidized by paying users, enterprise contracts, or investor capital. When AI labs report that they are not yet profitable at scale, inference cost is the primary explanation. The product they give away is genuinely expensive to give away.
Inference costs have been falling rapidly — this is one of the most underreported developments in AI. The cost per token of querying a GPT-4 class model has fallen by an order of magnitude since the model launched in 2023. Better hardware, improved quantization techniques that reduce model size without proportionate capability loss, batching optimizations, and architectural improvements have all contributed. The trajectory suggests that inference costs for current-generation models will eventually become cheap enough that the business model problem becomes manageable.
But there is a treadmill effect. As inference costs for current models fall, the industry produces larger, more capable models that cost more to run. The frontier of model capability keeps advancing, and staying at the frontier — which is what enterprise customers expect from leading AI providers — requires inference infrastructure that is always chasing the current state of the art. The cost reduction from optimizing last year’s model is partially offset by the cost of running this year’s model. The progress in inference efficiency is real and significant; it doesn’t eliminate the cost structure, it reshapes it.
The second major hidden cost category is continuous training and fine-tuning. Language models don’t stay static after their initial training run. They require ongoing refinement through reinforcement learning from human feedback — RLHF — which involves humans rating model outputs for quality, accuracy, safety, and helpfulness, and using those ratings to update model behavior. They require safety fine-tuning to reduce harmful outputs, a continuous process as new misuse patterns emerge. They require domain-specific fine-tuning for enterprise customers who need models specialized for particular tasks. They require updates as the world changes and the training data becomes stale.
None of this is trivial. A single RLHF cycle for a major model involves collecting tens or hundreds of thousands of preference judgments from human raters, processing those judgments through specialized training runs, evaluating the results, and iterating. The compute cost of this fine-tuning is smaller than initial training but nontrivial. The cost of the human raters is something else entirely.
The human annotation industry that undergirds AI training is one of the most underexamined aspects of AI economics and ethics simultaneously. Creating high-quality training data — labeling images, rating responses, generating examples of correct behavior, flagging harmful content — requires enormous quantities of human judgment, applied to individual examples, at scale. This work is performed largely by contractors in developing countries, particularly Kenya, the Philippines, and India, through intermediary platforms like Scale AI and Remotasks. Wages are low by developed-world standards; the working conditions involve repeated exposure to harmful content for content moderators and safety labelers; the work is cognitively demanding but rarely compensated commensurate with that demand.
A TIME investigation in 2023 found that Kenyan workers labeling harmful content for OpenAI were earning less than $2 per hour while being required to read and classify descriptions of violence, sexual abuse, and other disturbing material. The psychological toll of this work is documented; the compensation is not commensurate with it. When AI companies describe their models as “trained with human feedback,” the humans in that sentence are largely invisible in the public presentation of the technology. They are essential to the product and largely unacknowledged in its economics.
Data licensing is a third cost category that has grown dramatically and is likely to grow further. The large language models of 2020-2022 were trained substantially on web-scraped data — Common Crawl, public code repositories, Wikipedia — at costs that were manageable because the data was effectively free. That model is breaking down. Publishers, news organizations, and content creators have challenged the use of their content for AI training without compensation. Legal settlements and licensing agreements have become mandatory for companies that want to avoid litigation and maintain relationships with content producers.
OpenAI has struck deals with the Associated Press, Axel Springer, the Financial Times, and others. Google has similar arrangements. The terms of these deals are generally not disclosed, but they represent a structural shift in the cost of training data. As the internet’s reservoir of high-quality unencumbered text gets increasingly licensed or restricted, acquiring training data of sufficient quality requires either paying for it, generating synthetic data (which has its own quality limitations), or finding new sources. The training data cost that was nearly zero in 2019 is becoming a meaningful line item in 2026.
Safety red-teaming is a cost that gets almost no public attention because it doesn’t map to any familiar category of business expense. Before a major model is deployed, it undergoes extensive adversarial testing: teams of researchers and external contractors try to make the model produce harmful, deceptive, or dangerous outputs, in order to find failure modes that need to be addressed before release. This red-teaming involves significant compute costs for running attacks and evaluations, significant human costs for the teams doing the testing, and significant delay costs — the time spent on safety evaluation is time the model isn’t deployed and generating revenue.
Anthropic, which has positioned itself as the safety-focused major lab, reportedly spends a substantial fraction of its operating budget on safety research and evaluation. OpenAI’s preparedness team and similar functions at other labs represent ongoing cost centers that exist entirely outside the product development and inference cost structures. These costs scale with model capability — more powerful models require more extensive safety evaluation — which means that as the industry produces more capable systems, the safety cost per model increases.
The honest summary of AI economics is this: the industry is in a period of sustained, large-scale cash consumption by a small number of well-funded companies, with business models that remain fundamentally unproven at scale. Microsoft’s investment in OpenAI, Google’s internal investment in its AI division, Anthropic’s fundraising — these represent bets that the long-run economics of AI will justify the short-run costs. The bet may be correct. The history of transformative technologies suggests that early-stage economics often look terrible before they become excellent. But the widespread perception that AI is simply a software business with software-level economics is incorrect. The marginal cost of an AI query is real, significant, and not yet covered by the marginal revenue it generates at consumer pricing for most products.
The $100 million training cost that made headlines was a beginning, not a summary.




