Turing Award winner Richard Sutton says pure generative AI can't do real science

Richard Sutton, the godfather of reinforcement learning and a Turing Award laureate, is throwing a well-aimed stone into the generative AI punch bowl. His core claim is simple and devastating: today's large language models, for all their fluent text and stunning images, are intellectually sterile. They are magnificent parrots and terrible scientists. And the reason cuts to the very heart of what we mistakenly call "intelligence" in these systems.

Hot

Quality

Impact

Analysis 深度分析

Richard Sutton, one of the most brilliant minds behind reinforcement learning and a Turing Award laureate, is throwing down a gauntlet that the AI hype machine desperately needs to hear. He’s arguing that the entire foundation of today’s generative AI—the flashy, text-spitting, image-generating models that dominate the public imagination—is fundamentally incapable of true scientific discovery. And he’s right. This isn’t just a minor technical gap; it’s a philosophical chasm that reveals how we’ve mistaken the mimicry of intelligence for the engine of creation.

The core of his critique is surgically precise: conventional generative AI has no built-in mechanism to evaluate the quality or truth of its own outputs. It’s a dazzlingly sophisticated pattern-matcher, a stochastic parrot that recombines the vast library of human text and images into novel-looking configurations. But it lacks an internal critic. It cannot run an experiment, observe a result, and say, “Ah, that prediction was wrong. My model of the world is flawed. I must adjust.” It generates, then stops. The process is unidirectional, a one-way street of plausible-sounding outputs.

This is the fundamental weakness that separates a tool from a discoverer. True scientific creativity isn’t about spinning out endless possibilities—a "billion monkey" approach to hypotheses. It’s about the ruthless, iterative loop of conjecture and refutation. You posit something, you test it against reality, you fail, you learn, and you refine your understanding. This is the engine of knowledge. Generative AI, as currently architectured, sits at the conjecture stage and simply refuses to engage in the refutation. It can write a beautiful sonnet about a chemical reaction, but it cannot design the reaction, run it, and decide if the hypothesis holds water.

Sutton points to systems like AlphaGo and AlphaProof as the counter-model, and this is where the argument gets truly compelling. These are not pure generative models. They are built around a reinforcement learning loop where the system has a clear, intrinsic objective function—in Go, it’s winning the game; in mathematics, it’s reaching a valid, proven conclusion. The system generates a move or a proof step, and then the environment—either the Go board or a proof verifier—provides a stark, binary feedback: right or wrong, win or lose. This evaluation loop is baked into the system’s very DNA. The creativity emerges not from unfettered generation, but from a relentless, goal-directed dialogue with a definitive external standard.

This distinction is crucial. AlphaGo didn’t just “generate” creative moves; it generated moves that were evaluated as being part of a winning strategy. The creativity was a byproduct of a ruthless optimization process. Its novelty had purpose. In contrast, a large language model can generate a thousand novel hypotheses for a physics problem, but it has zero capacity to determine which, if any, are even plausible, let alone correct. It’s all sparkle, no substance. It’s the difference between a brilliant chess player calculating a path to checkmate and a poet writing about a chess game that may never be played.

So, what does this mean for the future? It means the current rush to pour generative AI into every scientific pipeline—from drug discovery to materials science—as a standalone “discovery engine” is dangerously misguided. These models are phenomenal tools for hypothesis generation, for mapping the vast space of “what could be,” and for accelerating mundane tasks like literature review or data formatting. But they cannot be the scientist in the loop. They need to be coupled with evaluation frameworks—simulations, experimental robots, formal verifiers, or human experts—to close the loop. The AI of the future scientist will not be a solitary oracle; it will be a tight, recursive partnership between a generative component and an evaluative one.

Sutton’s perspective is a necessary corrective to the narrative of “emergence” as a magical, unbounded force. He’s reminding us that intelligence, especially scientific intelligence, is not just about the output. It’s about the architecture of learning. It’s about having a skin in the game, defined by a clear metric of success or failure that the system itself is incentivized to improve against. Without that, we are left with elaborate parlor tricks. We get systems that can sound like Einstein but cannot think like him, because they have no internal model of being wrong.

The excitement around generative AI is understandable, but it has blinded us to its nature as a consummate imitator. Richard Sutton is pointing to the exit of this hall of mirrors. The real path to AI-driven scientific discovery lies not in bigger, more uncritical language models, but in the integration of generation with robust, built-in evaluation. It’s a harder, less glamorous problem than scaling parameters. But it’s the only one that matters if we want machines that don’t just regurgitate our past discoveries in prettier packages, but actually help us write the next chapter. Until they can evaluate their own results, they are, and will remain, magnificent but purposeless oracles.

理查德·萨顿（Richard Sutton）一句话，把生成式AI的底裤扒了下来：它不会评价自己的产出，因此搞不了真正的科学。这话从强化学习之父、图灵奖得主嘴里说出来，分量很重，也极其准确。

我们如今被各种大模型生成的文本、图像、代码淹没，它们流畅、博学、甚至偶有惊艳，但这繁荣之下掩盖了一个根本缺陷——它们是没长眼睛的画师，只是在模仿和重组，却无法真正“看”到自己画作的好坏，更谈不上为了一个更高的、内在的目标去自我修正与迭代。萨顿将AlphaGo/AlphaProof这类系统拿出来对比，是极其精准的。那些系统的内核有一个无情的、内置的裁判：胜负。赢了，强化；输了，惩罚。这个清晰、即时、不可辩驳的反馈循环，才是它们能超越人类、产生“创造性”棋路的根源。它们不是在生成，而是在一场有终点的、自己和自己对弈的无限游戏中，通过反复试错和评估来逼近最优解。

这恰恰击中了当前AI热潮的痛点。科技公司竞相推出更大的模型、更丰富的模态，是在做“生成器”的军备竞赛，而非“思考者”的架构革命。我们训练AI阅读人类写过的一切，让它成为最博学的鹦鹉，却没有赋予它一套内在的“真理探测器”或“好奇心引擎”。科学发现的核心不是生成一段关于黑洞的新描述，而是提出一个前所未有的假设，并设计实验或推导去证伪或证实它。这是一个循环：猜想-实验-评估-修正。目前的生成式AI，只擅长第一步“猜想”（即生成看似合理的文本），它完全缺失了后三者中最关键的“评估”环节。它不知道自己生成的下一个粒子物理方程是胡扯还是洞察，除非人类物理学家来验证。

所以，萨顿的批评指向了一个更深层的、关于“创造力”定义的分歧。在商业应用层面，生成式AI的“创造力”（生成多样化的营销文案、设计草图）已经足够，因为它追求的是满足人类的即时偏好。但在追求客观真理的科学领域，创造力意味着与现实世界的硬核碰撞，并需要一种近乎本能的、对“更好解释”的渴求与判断力。AlphaGo的“创造力”体现在它走出了人类棋谱之外的“神之一手”，但这一手之所以“神”，是因为它最终被“赢棋”这个客观结果所验证。科学上的“神来之笔”同样需要被自然规律所验证。

由此延伸，萨顿其实划下了一条清晰的界限：什么样的AI能带来真正的颠覆？不是更大的语言模型，而是那些目标函数清晰、能从与环境交互中持续学习、并内置了强大评估反馈闭环的系统。比如，一个能自主设计实验、操作实验室机器人、分析结果并调整假设的AI科研代理（Agent）。它的“知识”不是来自静态的语料库，而是来自动态的、与真实世界交互产生的、自己能够评估的数据。这才是从“模式模仿”到“自主探索”的质变。

当然，我们也要警惕一种“唯结果论”的陷阱。并非所有科学过程都像下棋那样有非黑即白的即时反馈。基础理论的构建，有时需要漫长的沉思和灵感迸发，其评估周期很长。如何为这种更模糊、更长程的创造性工作设计评估机制，是下一个更艰深的挑战。但无论如何，萨顿是对的：没有评估能力的生成，只是华丽的回声，而非创造的惊雷。当前这场生成式AI的盛宴，或许只是通往真正智能的一段喧闹插曲，而不是终曲。真正的主角，还在后台调试着它与世界的交互规则。

Disclaimer: The above content is generated by AI and is for reference only.

科学研究评测大模型

Read Original →

Analysis 深度分析

Related Articles 相关文章