Why the 'AGI Is Near' Crowd Is Both Completely Right and Catastrophically Wrong

Photo: Unsplash

The AGI Debate

Why the 'AGI Is Near' Crowd Is Both Completely Right and Catastrophically Wrong

The most important disagreement in AI isn't about politics or safety — it's about a word that nobody can define
agiartificial-general-intelligenceai-safetyphilosophymachine-learning

In 2023, Sam Altman told an interviewer that OpenAI might achieve artificial general intelligence “in the reasonably close-term future, maybe within a decade.” In 2024, Demis Hassabis of Google DeepMind suggested it could arrive “in the next few years.” Elon Musk has placed it at 2025, then 2026, then some other year. Yann LeCun, head of AI research at Meta, has argued that current deep learning approaches are fundamentally incapable of achieving AGI and that we remain deeply confused about what general intelligence even is. Prominent AI safety researchers disagree with each other by margins of decades on the question of when and whether.

These disagreements are regularly presented as empirical disputes about the trajectory of AI technology — debates that could in principle be resolved by better forecasting, better interpretability research, better benchmarking. They are not. They are primarily philosophical disputes dressed up as empirical ones, and the philosophical disagreements are so fundamental that most of the specific predictions — including the confident-sounding ones from people who build these systems for a living — are close to meaningless.

The problem is simple to state and genuinely difficult to resolve: nobody agrees on what AGI means. Not in the technical community, not in the philosophical community, not even within individual organizations that claim to be building toward it. Debates about whether AGI is near, arrived, or decades away are debates among people who are using the same word to refer to different things.

The confusion operates at multiple levels. At the most basic level, “general” in “artificial general intelligence” contrasts with “narrow” — the idea that current AI systems, however impressive, are specialized tools that do specific things well but lack the flexible, transferable intelligence that characterizes human cognition. A chess engine plays chess superbly and cannot do anything else. Early image classifiers classified images and had no other capabilities. The original framing of AGI was intelligence that could transfer across domains the way human intelligence does — you can play chess, write poetry, diagnose mechanical problems, and navigate social situations, all using the same underlying cognitive apparatus.

By this original definition, AGI may already be substantially closer than the original framers expected, and for reasons that have nothing to do with the theoretical debates. GPT-4 class models can write code, discuss philosophy, analyze images, summarize legal documents, and engage in creative writing, often with impressive competence. The domain transfer that characterized “narrow” AI as fundamentally limited has been substantially achieved, at least in the sense of a single system being competent across many verbal and visual domains. Whether this constitutes “general” intelligence depends entirely on how you define the term — and this is where the debate collapses into philosophy.

Consider what human intelligence actually is. It includes verbal reasoning, mathematical ability, spatial cognition, social intelligence, embodied interaction with a physical environment, long-term planning, metacognition — the ability to think about one’s own thinking — and something that might be called common sense: a vast accumulation of implicit knowledge about how the physical and social world works. Current AI systems excel at some of these dimensions (verbal reasoning, pattern recognition in large datasets), are mediocre at others (spatial cognition, physical manipulation), and are either absent or deeply unclear on others (genuine metacognition, robust common sense reasoning that generalizes to truly novel situations).

The “human-level” benchmark that dominates popular discussion is particularly incoherent. Human intelligence is not a single scalar quantity. It is a multidimensional profile that varies enormously across individuals and contexts. A human who is the world’s best chess player is worse than GPT-4 at drafting persuasive prose. A human who is the world’s best novelist is likely worse than a seven-year-old at solving certain spatial reasoning tasks. “Human-level” as a benchmark assumes a unified standard that doesn’t exist even within the human population.

Moreover, current AI systems already exceed human performance on many specific benchmarks that once seemed like reasonable proxies for general intelligence. They outperform the vast majority of humans on legal reasoning tests, medical diagnosis tasks given similar information, coding challenges, and standardized testing. This prompted a well-documented phenomenon in AI research: benchmark saturation. Researchers design a test that seems to require sophisticated reasoning, AI systems are developed that exceed human performance on that test, and researchers conclude that the test was actually measuring something simpler than true intelligence. Then a new test is designed, and the cycle repeats.

This benchmark treadmill reflects a genuine philosophical problem. We don’t have a rigorous theory of what intelligence is. We have proxies — tasks that seem like they require intelligence to perform — and when AI systems pass those proxies, we revise our beliefs about what intelligence requires. The Turing test was supposed to be the definitive benchmark; when it became clear that current systems could pass it in limited settings, the consensus shifted to deciding the test was inadequate. This moving-target structure is not a failure of AI research; it’s a reflection of the fact that we are trying to measure something we can’t define.

The philosophical tradition of thinking about general intelligence doesn’t help as much as one might hope. The distinction between “weak” and “strong” AI — between systems that simulate cognition and systems that actually have cognitive states — maps onto debates about consciousness and subjective experience that have not been resolved in three centuries of philosophy of mind. The question of whether an AI system truly “understands” language or merely performs sophisticated pattern matching is arguably unanswerable given current frameworks, because we don’t have a theory of understanding that clearly distinguishes these cases even in the human context.

What we do have is a practical observation: the systems built in the 2020s are qualitatively different in capability from the systems built in the 2010s, in ways that surprise even their creators. GPT-4’s ability to engage in extended, contextually appropriate reasoning about novel problems exceeded what most researchers predicted when the development trajectory was examined a few years earlier. Whether this constitutes progress toward AGI depends on the definition, but it is unambiguously a significant capability expansion that has real consequences.

The more tractable critique of the AGI discourse is not philosophical but strategic: the debate about AGI timelines and definitions actively distracts from more immediate, specific, and tractable questions about AI’s effects. The question of whether AGI arrives in 2027 or 2035 or 2050 is less important for policy purposes than the questions of how AI systems deployed today affect labor markets, how they amplify or mitigate existing biases, how they change the economics of content production, and how they should be governed given their current capabilities.

This is not a counsel of dismissiveness about AI capabilities or AI risk. Quite the contrary. The risk frameworks built around AGI timelines tend to generate either complacency (AGI is decades away, so current systems are safe to deploy without restriction) or catastrophism (AGI is near, therefore current systems are precursors to existential risk). Both framings direct attention away from the actual harm-causing mechanisms of current systems, which are mundane compared to superintelligence scenarios but real and serious.

Algorithmic hiring systems discriminate systematically against certain groups right now, without any AGI. Content recommendation systems amplify extremist content right now, without AGI. Autonomous weapons systems are being developed and deployed right now, raising accountability questions that need to be resolved before any question of AGI is relevant. Medical AI that produces plausible-sounding but wrong diagnoses is causing real harm right now. These harms have known mechanisms and tractable interventions. They are being partially neglected because the discourse is consumed by the more dramatic question of when the machines become smarter than us.

There are also near-term questions about AI capability trajectories that are genuinely important and empirically tractable without invoking AGI at all. How rapidly will inference costs fall? How quickly will AI capabilities in embodied physical tasks develop? How vulnerable are the largest models to adversarial manipulation? What happens to information quality online when AI-generated content is indistinguishable from human-generated content at scale? These questions have practical implications and don’t require a definition of general intelligence to analyze productively.

The “AGI is near” crowd is not wrong about the pace of capability development. The systems being built in 2026 are far more capable than the systems of 2020, and the trajectory suggests continued progress. They are wrong in their implication that capability progress maps cleanly onto the specific concept of “general” intelligence and that timeline predictions about AGI have meaningful precision.

The “AGI is far or impossible” crowd is not wrong that current systems have fundamental limitations — they do hallucinate, they do fail on genuinely novel reasoning tasks, they do lack the embodied common sense that comes naturally to humans. They are wrong to use these limitations as evidence that the pace of progress should not concern us, or that the question of AI capability development is disconnected from immediate policy questions.

What would actually be worth asking instead of “when is AGI?” might include: At what capability level do AI systems require qualitatively different governance than they receive today? What are the specific failure modes of current systems that cause the most harm, and what interventions address them? How should societies distribute the gains from AI productivity improvements? These questions are answerable. They involve real tradeoffs that real institutions can address. They don’t require defining general intelligence.

The AGI debate is not unintelligent. Some of it reflects genuine scientific and philosophical engagement with hard questions. But as a cultural phenomenon — the thing consuming attention in podcasts, social media, and boardrooms — it functions primarily as a distraction from the places where AI governance choices matter most urgently. The word “AGI” has become almost perfectly calibrated to generate heat without light.