All Deep Analysis Foresight AI News Open Source AI Products Research Papers AI Security AI Practices AI Skills AI Overseas

AI News 1mo ago • Updated 1mo ago 62

Google Deepmind's AlphaProof Nexus solves decades-old math problems for a few hundred dollars

Google DeepMind’s AlphaProof Nexus marks a notable shift in automated mathematics by solving nine open Erdős problems, including two that had resisted

Hot

Quality

Impact

Analysis 深度分析

Background

The central fact is not just that AlphaProof Nexus solved hard math problems, but that it did so in a formally verified way. The article contrasts Google DeepMind’s system with OpenAI’s natural-language approach, highlighting two different philosophies in AI reasoning:

one prioritizes natural-language mathematical argumentation
the other prioritizes formal proof verification through Lean

That distinction matters because mathematics is unusually sensitive to small logical errors. A natural-language proof may sound convincing while still containing a gap; a Lean-verified proof must pass a compiler-like check at every step.

Key Points

Nine open Erdős problems solved

The headline achievement is the autonomous solution of nine open Erdős problems. Because Erdős problems are associated with serious combinatorial and number-theoretic difficulty, the result signals more than benchmark performance. It suggests the system can contribute to live mathematical research rather than merely solve curated exercises.

Two problems resisted humans for 56 years

The most dramatic detail is that two of the solved problems had stumped mathematicians for 56 years. This sharpens the claim from “capable” to “historically significant.” If accurate, AlphaProof Nexus is not only accelerating known methods but also reaching problems that remained untouched across generations of human work.

Very low inference cost

The article emphasizes “a few hundred dollars per problem” in inference costs. That is important because it changes the economics of experimentation. A system that can attempt serious research-level problems at such low marginal cost creates the possibility of scaling mathematical search in a way human labor cannot.

This does not mean every attempt is cheap in aggregate, however, because low per-problem cost must be weighed against the low hit rate.

Formal verification via Lean

The strongest technical point is the use of the Lean compiler to verify every proof step automatically. This gives the system a major advantage in trustworthiness:

proofs are not accepted on stylistic persuasiveness
each intermediate step must satisfy a formal checker
the output is therefore closer to mathematical certainty than ordinary language-based reasoning

This likely explains why the article frames AlphaProof Nexus against OpenAI’s approach. The competition is not only about solving problems, but about what counts as a reliable solution.

Success rate remains just 2.5 percent

The article tempers the excitement with a critical limitation: the overall success rate is only 2.5 percent. This is a severe constraint. It means the system is impressive in peak performance but weak in consistency.

A 2.5 percent success rate implies:

the system fails on the overwhelming majority of attempts
broad claims about general mathematical intelligence would be premature
the cost per successful outcome may be much higher when failed runs are included

So while the solved problems are remarkable, the low success rate suggests AlphaProof Nexus is currently better understood as a high-variance research instrument than a dependable all-purpose theorem prover.

Significance

The article points to a deeper shift in AI mathematics: formal systems may outperform natural-language systems where correctness matters most. In many domains, sounding right is enough to be useful. In mathematics, that is not enough. By grounding proof generation in Lean, AlphaProof Nexus addresses the core weakness of language-model reasoning: unverifiable confidence.

At the same time, the 2.5 percent figure prevents overstatement. The article presents a system that is simultaneously:

extraordinary in best-case results
limited in average-case reliability

That tension is the real takeaway. The breakthrough is not that AI has “solved mathematics,” but that machine-verified proof search can occasionally surpass decades of human effort at surprisingly low direct cost.

Broader Implication from the Article’s Framing

The comparison with OpenAI suggests a competitive divide in AI research strategy:

Natural-language mathematical reasoning aims for flexibility and accessibility.
Formal verification-first reasoning aims for rigor and certainty.

The article clearly favors the second as the more consequential development in this context. Since AlphaProof Nexus is being credited with solving open problems rather than merely generating plausible arguments, the implication is that formalism may be the more fruitful path for frontier mathematical discovery.

Final Assessment

The article presents AlphaProof Nexus as a proof of concept for autonomous, low-cost, formally verified mathematical discovery. Its nine solved Erdős problems, especially the two unresolved for 56 years, show genuine research-level power. But the 2.5 percent success rate is a major reminder that this is not yet robust or general. The achievement is therefore best viewed as a narrow but real breakthrough: not reliable enough to replace mathematicians, yet strong enough to alter expectations about what AI can already do in pure mathematics.

背景与问题

这条信息呈现出一个鲜明对比：一方面，AI 已能攻克长期未解的数学问题；另一方面，它的整体成功率依然很低。尤其“两道困扰 56 年”的表述，说明系统并非只是在边缘题目上取得进展，而是触及了真正具有历史难度的对象。

核心内容

最值得关注的不是“解出 9 题”本身，而是它采用的方法：

AlphaProof Nexus 使用 Lean 编译器
每一步证明都能被自动验证
这意味着结果不是模糊的语言性推断，而是更接近机器可检验的严格数学证明

这与文中提到的 OpenAI 的自然语言路径形成对照。区别在于：

自然语言模型擅长提出思路、草拟证明
形式化系统更强调可验证性与逻辑闭环
在数学场景里，后者的价值尤其高，因为证明正确性比表达流畅更重要

同时，“每题仅数百美元推理成本”透露出另一个重要信号：这类高难度数学求解正在从纯实验能力走向可计算成本衡量的工程能力。这使其不只是学术展示，也具备未来规模化应用的想象空间。

意义与影响

这项进展的意义主要有三点：

证明了形式化 AI 在高端数学探索中的潜力
能解决开放 Erdős 问题，说明系统并非仅做训练集复现，而是在部分场景中具备真实发现能力。
验证机制可能比语言能力更关键
Lean 的自动验证强调，未来数学 AI 的竞争力，可能不只取决于“会不会说”，更取决于能不能被机器严格证明为真。
成功率暴露出现阶段局限
2.5% 的成功率说明它仍远未达到稳定可靠。也就是说，当前更像是少数高光突破，而不是已经普遍可用的数学研究助手。

综合判断

这条信息真正传递的核心，不是 AI 已全面接管数学，而是形式化证明驱动的系统开始在极少数高难问题上展现出超预期突破。它的价值在于证明方向可行；它的局限则在于成功仍然稀缺。未来决定其影响力的，不只是再解出多少题，而是能否把 “惊艳的个案” 提升为 “稳定的方法”。

Disclaimer: The above content is generated by AI and is for reference only.

Gemini LLM Inference Research Evaluation

Read Original →