Deep Analysis · 9 min read · 2d ago

GPT-5 Pro Proves New Mathematics: Has AI Reached PhD Level Yet?

# GPT-5 Pro Proves New Mathematics: Has AI Reached PhD Level Yet? On August 20, 2025, Sebastien Bubeck, an OpenAI researcher and former VP of GenAI at Microsoft, posted something on X that would spar

GPT-5 Pro Proves New Mathematics: Has AI Reached PhD Level Yet?

On August 20, 2025, Sebastien Bubeck, an OpenAI researcher and former VP of GenAI at Microsoft, posted something on X that would spark a debate far beyond the mathematics community:

He gave GPT-5 Pro an open problem from a convex optimization paper. The model "thought" for 17 minutes and 5 seconds, then produced a correct, novel proof — pushing a known lower bound from 1/L to 1.5/L. Bubeck verified the proof himself. It was correct.

The post has since accumulated over 7 million views. Some called it AGI. Others shrugged — it's just a constant. Mathematicians started checking the logic line by line.

This wasn't GPT-5 Pro's first math success, and it wouldn't be its last. But the question it raises is harder to pin down than any theorem: when an AI can autonomously produce verifiable, novel mathematical proofs, have we crossed a line?

▎What Did GPT-5 Pro Actually Prove?

Let's be precise. GPT-5 Pro didn't solve the Riemann Hypothesis or anything headline-ready. It tackled a tighter, more technical problem about gradient descent step sizes in convex optimization.

Gradient descent is the backbone of optimization. To minimize a function, you walk in the opposite direction of the gradient. How big should each step be? Too small and convergence is glacial. Too large and you overshoot. There's a classic result: for L-smooth convex functions, step sizes up to 1/L are guaranteed to converge. The number 1/L is a safe baseline.

The original paper had pushed the boundary a bit, but there was still a gap between the known-safe 1/L and the best-possible upper bound of 1.75/L. Could you take larger steps and still guarantee convergence? Nobody had a clean answer.

Bubeck fed the paper's setup to GPT-5 Pro. The model's response: a constructive proof that pushed the lower bound to 1.5/L. Not a world-shattering leap — human researchers later closed the gap to 1.75/L in the paper's v2. But the proof was new, the strategy was different from the human approach, and Bubeck — a recognized authority in convex optimization — checked every step.

In his FOCS 2025 plenary talk, Bubeck revealed the key insight: GPT-5 Pro swapped out a component in the original proof for a more efficient variant, one that's well-known in combinatorics but whose application to this specific problem was not obvious. This "cross-domain transfer" — recognizing that a tool from one area can solve a problem in another — is one of the most prized skills in mathematical research.

▎The 2026 Timeline: Acceleration

The convex optimization proof was not a fluke. The months that followed saw AI math capabilities compound at a startling rate.

January 2026: GPT-5.2 Pro independently proved Erdős Problem #281 — a 45-year-old unsolved conjecture in number theory about covering systems and natural density. Fields Medalist Terence Tao verified the proof and gave it a remarkable endorsement: "What impressed me more is that it avoided the errors — interchanging limits, quantifier order mistakes — that are exactly where this problem is most treacherous. Earlier LLMs would almost certainly have stumbled on these subtleties."

In a twist that delighted the math community, a user named KoishiChan discovered a much simpler proof using two theorems from 1936 and 1966 that Erdős himself (who posed the problem in 1980) apparently didn't know about. It was a humbling reminder that knowledge gaps exist for humans and AI alike.

May 9, 2026: Fields Medalist Timothy Gowers published a blog post that sent shockwaves through academia. He gave ChatGPT 5.5 Pro open problems from a paper by number theorist Mel Nathanson. The model, without meaningful human guidance, produced PhD-level research in under two hours.

Gowers' words: "My own mathematical contribution was zero. I didn't even do anything clever with the prompts."

The model thought for 17 minutes, produced an optimal construction with a quadratic bound, then wrote the entire argument as a LaTeX preprint in 2 minutes and 23 seconds. Gowers said the result "would have made a perfectly reasonable chapter in a PhD thesis."

May 20, 2026: OpenAI's internal reasoning model disproved Erdős' Unit Distance Conjecture — a problem in discrete geometry that had stood for 80 years. Mathematicians had long assumed the optimal configuration approximated a square grid. The AI found a counterexample, proving that an infinite family of points could beat the threshold by an explicit polynomial factor.

From August 2025 to May 2026 — nine months — AI mathematical research went from a "noteworthy anecdote" to an undeniable trend.

▎The Real Success Rate: Tao's Warning

Here's where the picture gets complicated.

Every AI math success goes viral. Every failure stays buried. Terence Tao, in his Erdős Problems thread, pointed out this exact bias:

"Assessing the real success rate of AI tools is dominated by massive reporting bias. Negative results are almost never disclosed. If someone tries an AI on an open problem and makes no progress, they have no incentive to report that negative conclusion."

A community-run project now tracks LLM performance on Erdős Problems systematically. The data shows a real success rate of about 1-2%. With over 600 unsolved problems in the Erdős corpus, that's still a meaningful number of contributions. But 98-99% of attempts fail — you just never hear about them.

Current successes also cluster heavily in combinatorics and number theory — "tool-rich" fields where AI can rapidly traverse a large toolkit and find unexpected combinations. Progress is much slower in fields that demand geometric intuition or long reasoning chains (topology, algebraic geometry).

And there's a structural filter: every published success story had a world-class mathematician as the gatekeeper. Bubeck, Gowers, Tao — they're not random domain experts. They're elite mathematicians who can instantly tell a correct proof from a plausible-looking one. The BrokenArXiv benchmark drives this home: even the best models (GPT-5.4) correctly identified flawed math statements less than 40% of the time.

AI cannot yet self-correct. It needs expert triage. That's progress, but it's not autonomous research.

▎How Mathematicians Are Reacting

Gowers' blog post turned Reddit's r/math into a battleground.

The core argument isn't whether AI is right or wrong — the proofs check out. The question is what this means for the discipline.

Gowers himself predicts a crisis: "If AI mathematics continues to progress at anything like its current rate... then we will face a crisis very soon."

The crisis has two dimensions:

For PhD students: A math PhD's core training is solving open problems. If an AI can produce thesis-level research in two hours, what's the value of five years of human work? How do you evaluate original contribution? Many mathematicians now argue that PhD programs need fundamental reform — shifting from "prove something new alone" to "collaborate with AI and understand its outputs."

For the discipline: Gowers sees human-AI collaboration as inevitable. The question is whether mathematicians need to learn AI tools the way they learn analysis or algebra. Terence Tao has made similar arguments: math education must change, and if it doesn't, it will break.

Skeptics have a different take. A top comment on r/math put it this way: "AI excels at combinatorics because it can access the whole toolkit at once. But this is more like a machine assembling a jigsaw puzzle than performing a creative act." Underlying this view is a conviction that math is more than proof — it's concept creation and theory building. No AI system has yet demonstrated conceptual innovation.

Either way, the conversation has moved from "can AI do math?" to "how should we reorganize math around AI?" In 2026, leading math departments are running seminars on AI-assisted research. Harvard and MIT held a joint workshop on "Mathematics Education in the Age of AI."

▎What This Means Beyond Math

Zoom out, and the convex optimization proof is about more than mathematics.

Bubeck's trajectory is revealing. He's a convex optimization authority who wrote the field's standard textbook. He moved from Microsoft to OpenAI, from efficient small models (Phi series) to the frontier of AGI research. When he says GPT-5 Pro "can do new mathematics," he's not doing PR — he's stress-testing his own field with his own tools.

In his FOCS 2025 talk, Bubeck traced the arc: from GPT-4 barely handling high-school math, to GPT-5.5 Pro producing PhD-level research — in three years. The improvement curve is roughly exponential.

If mathematical reasoning — often called the "crown of human reasoning" — is being penetrated by AI, what isn't?

Three takeaways:

Reasoning scales. From GPT-5 Pro to GPT-5.5 Pro, each upgrade brought a step-function improvement in mathematical capability. Scaling law isn't dead; it's just moved from language modeling to reasoning.

From exercises to research. Pre-2024, AI math was about exams — IMO problems, undergraduate tests. Late 2025 marked the transition to open problems. That's a phase change. Exam problems have known solution paths. Open problems don't.

Tool use + reasoning. GPT-5 Pro's convex optimization proof relied purely on reasoning. By GPT-5.5, the model was actively using code execution, symbolic computation, and search. It's not just "getting smarter" — it's learning to use tools, the same way human researchers do.

▎Closing

There's a detail in Bubeck's post that I keep coming back to.

He said the GPT-5 Pro proof was good enough that he considered submitting it as an arXiv note. But the original paper's authors had already released v2, closing the gap to 1.75/L — a better result. So he dropped the idea.

The most remarkable part of this story isn't that AI proved a theorem. It's that after the AI proved its theorem, humans were still ahead. In this particular race, humans won.

Gowers' experience suggests something more unsettling: when AI can produce PhD-level work in two hours, and a human needs years to do the same thing, how long does "humans are still ahead" remain true?

I don't have the answer. But history suggests that every time someone says "AI will never do X," they eventually have to update their priors.

Mathematics is just the latest X.


Sources:

  • Sebastien Bubeck @ X, Aug 20, 2025
  • Timothy Gowers, "A recent experience with ChatGPT 5.5 Pro", May 2026
  • Terence Tao @ Erdos Problems #281, Jan 2026
  • OpenAI, "Introducing GPT-5.5", Apr 2026
  • FOCS 2025 Plenary Talk, "Recent Advances in LLMs for Mathematics", S. Bubeck
  • arXiv: 2602.05192 (First Proof benchmark)
  • BrokenArXiv benchmark, MathArena
Share: