Anatomy of an Overpromise

Why Most 'Agents' Are Still Just Pipelines

The word 'agent' has been stretched to cover systems that would have been called automation scripts five years ago

By Jakub Jirák Jan 13, 2027 7 min read

ai-agents automationagentic-aipipeline-designenterprise-ai

The word “agent” carries a specific philosophical weight. An agent, in the sense that philosophers and cognitive scientists use the term, is something that perceives its environment, reasons about it, and takes actions in pursuit of goals. An agent can adapt to novel situations. It can recognize when its current approach is not working and try a different one. It maintains goals over time and pursues them flexibly across varied circumstances. Agency implies a kind of purposeful adaptability that distinguishes it from mere mechanism.

When technology companies call their products “agents,” they are reaching for this philosophical weight. They want you to think of a purposeful, adaptive, goal-directed system — not a script. The gap between the connotation and the reality of most deployed “agents” in 2027 is large enough to matter for how enterprises make deployment decisions, evaluate performance, and set expectations for what these systems can and cannot do.

A pipeline — in the software engineering sense — is a series of processing steps where the output of each step becomes the input of the next. The steps are specified in advance by the developer. The path through the pipeline is determined by logic that the developer wrote, not by the pipeline’s own reasoning. A pipeline can be very sophisticated: it can have branches, loops, conditional logic, and calls to external APIs. It is still, in the meaningful sense, mechanism — it does exactly what its developer specified, in conditions its developer anticipated.

An agent — in the meaningful sense — reasons about what to do next rather than following a pre-specified path. It can determine that its current approach is not working and choose a different one. It can decompose a novel problem into sub-problems that its developer did not explicitly enumerate. It can recognize that a situation falls outside its normal operational parameters and decide whether to proceed carefully, escalate, or refuse.

The distinction is not binary — there is a spectrum. But most of what is currently being sold and deployed as “autonomous agents” sits much closer to the pipeline end than the promotional material implies.

The tell-tale signs of a pipeline masquerading as an agent are consistent across deployments. The system performs well within a well-defined set of task types and fails (or requires human intervention) when it encounters tasks that differ from that set in any significant way. The failure mode is a hard error or a confident wrong answer rather than a graceful “I don’t know how to handle this” response. The system’s “reasoning” is effectively predetermined: given input type A, do steps 1, 2, 3; given input type B, do steps 2, 4, 5. The branching logic appears, if you can inspect it, in the system’s code or prompt templates rather than emerging from genuine situation-by-situation reasoning.

This is not a criticism of these systems — a well-engineered pipeline that reliably handles a specific class of tasks is genuinely useful. The criticism is of the framing, because the framing matters for what happens when the system meets situations its developers did not anticipate (failure, not graceful adaptation), for how it should be supervised (human review at each decision point, not occasional spot checks), and for what happens when organizational requirements change (re-engineering, not retasking).

The incentive structure that produces the mislabeling is straightforward. “Agent” is more impressive than “pipeline.” “Autonomous AI agent” commands higher sales prices, attracts more investor attention, and generates more favorable press coverage than “AI-assisted automation pipeline.” The vendor’s product may genuinely involve a language model making some number of decisions at each step — calling an LLM to classify a document, summarize text, or choose between two processing paths — rather than purely deterministic rule execution. This makes it something more than a traditional pipeline. But “more than a traditional pipeline” and “agent with genuine adaptive reasoning” are not the same category.

The confusion is compounded by the fact that the researchers and developers who build these systems often use the term “agent” in a technical sense that differs from its colloquial use. In the reinforcement learning literature, “agent” simply means a system that takes actions and receives rewards — a narrow technical definition that makes no claim about the sophistication of the reasoning involved. A rule-based system that takes actions is an “agent” in this sense. Vendors often use this technical permissiveness to support claims that their marketing language implies the more ambitious version of agency.

The practical consequences of this definitional imprecision surface most clearly in failure. An enterprise that deployed an “autonomous agent” to handle contract renewals discovers, twelve months in, that their agent handles routine renewals correctly but cannot handle renewals where the vendor has changed their standard terms — a situation the developers did not anticipate and the pipeline was not designed for. The enterprise, having told their legal team that the agent handles renewals, has not maintained the legal team’s capacity to handle renewals manually. The mismatch between “agent” as sold and “pipeline” as deployed has created an operational gap.

A more honest framing would have led to different decisions. If the system were described as “a pipeline that handles X, Y, and Z categories of renewals and escalates anything else,” the legal team would have maintained familiarity with the escalation category. The boundaries of the system’s capability would have been designed explicitly rather than discovered through incidents.

There are systems that genuinely earn the term “agent” — that exhibit flexible, adaptive reasoning in response to novel situations, that can recognize and recover from partial failures, that maintain goals coherently over extended multi-step tasks. These systems exist in research environments and in a small number of production deployments where the investment in their development has been commensurate with the ambition of the term. They are not the majority of what is being sold and deployed as “agents” in 2027.

The frontier AI labs are developing genuine agents. The systems they are developing can handle novel situations, reason flexibly about task decomposition, maintain goal coherence over extended tasks, and adapt to failure in ways that go beyond pre-specified recovery procedures. These systems are impressive. They are also expensive, fragile at scale, and difficult to deploy safely in production environments with real enterprise constraints. The market gap between “genuine agent capability” and “what can be reliably deployed in enterprise production” remains large.

What fills that gap — the commercially deployed “agents” that most enterprises are actually running — is sophisticated pipeline automation with LLM reasoning at specific decision nodes. That is useful. It deserves honest description so that organizations can evaluate, deploy, and maintain it with appropriate expectations. Calling it something it is not produces deployments optimized for the wrong properties and supervised with the wrong level of attention.

The history of technology is full of cases where capability-gap marketing produced a coherent short-term market while building long-term skepticism. When enterprise software vendors in the 1990s overpromised the integration and intelligence of their systems, organizations invested heavily, discovered the systems were more constrained than represented, and spent years managing the gap between promised and actual capability. The term “AI” itself has been through one major credibility crisis (the AI winter of the 1980s) caused in part by the gap between the ambitious claims of the time and what systems actually delivered.

The “agents” mislabeling is not on the scale of the AI winter, and the underlying technology is substantially more capable than the early AI systems that generated the credibility crisis. But the pattern — promotional language that outpaces capability, followed by deployment expectations mismatched to actual system behavior, followed by disappointment and recalibration — is recurring. The recalibration always comes. The only question is whether it comes with minor disappointment or major wasted investment.

Enterprises that apply clear-eyed capability assessment to what they are actually deploying — pipeline with LLM reasoning nodes, useful and reliable within its defined scope — will make better deployment decisions than those that buy the agent framing wholesale. That clarity is available. It just requires resisting the marketing.

The practical test for distinguishing genuine agents from pipeline-with-LLM-reasoning is a simple one: put the system in a genuinely novel situation that its designers did not specifically prepare for and observe what happens. A genuine agent should recognize the novelty, reason about it, and attempt a principled response even if it ultimately fails. A sophisticated pipeline should fail in a characteristic way — either producing a wrong answer confidently, or stalling at the step that does not match its branching logic. The failure mode tells you more about the system’s architectural nature than any amount of promotional material.

This test is rarely applied before deployment, which is why the discovery of pipeline-ness usually happens in production when users encounter the edge cases that the designers did not prepare for. The organizations that apply it before deployment can design their oversight accordingly: more human review for edge cases, explicit escalation when the system encounters input patterns it has not handled before, and user training that accurately describes the system’s scope.

There is a final dimension to the pipeline-versus-agent distinction that is more conceptual but ultimately important: the question of what we are building toward. If the goal is capable agents in the strong sense — systems that can reason adaptively about novel situations, maintain coherent goals across complex tasks, and act reliably across the full range of situations a human employee would face — then designing pipelines and calling them agents is not just misleading marketing. It is building toward the wrong target. Pipeline optimization makes pipelines better. It does not develop the architectural capabilities that genuine agency requires.

The research community building toward real agency is doing work that is quite different from the pipeline optimization that dominates commercial deployment. The gap between those two worlds — the research agenda and the commercial agenda — is a structural feature of where the industry is right now. Acknowledging it is the first step to building the right things for the right reasons, rather than optimizing the wrong architecture and hoping it eventually turns into the right one.

Why Most 'Agents' Are Still Just Pipelines

Windsurf: Pausing Cascade versus pausing yourself

AI coding workflows: Daily checklist that outlasted two tool migrations

The Orchestra of Hours: Conducting Time Like a Maestro

Cline and Roo Code: Picking a model per task instead of per editor

MCP servers: Composing agents over a tiny catalogue of shared tools

The Economics of Colonial Tax Systems

AI coding workflows: Small reversible steps instead of one heroic prompt

macOS and the Art of Productive Flow