GPT-5.4 Pro cracked two Erdős problems with one method

On April 13, 2026, a 23-year-old named Liam Price typed Erdős Problem #1196 into GPT-5.4 Pro and watched the model think for about 80 minutes. The output was a proof sketch for a 1966 conjecture about primitive sets of integers. That part made the headlines. What happened next is the actual story: when eight mathematicians turned the sketch into a rigorous paper, the same proof idea cleanly resolved a second 1966 Erdős conjecture along the way.

One method, two problems

Price had no advanced mathematics training. He had ChatGPT, the Erdős Problems website, and one specific question: can a primitive set of integers, a collection where no member divides another, have an Erdős sum bounded uniformly? GPT-5.4 Pro returned an approach built on Markov chains, a way of studying sequences of random steps, paired with the von Mangoldt function, a number-theoretic tool that picks out primes and their powers. Neither piece was new on its own. The combination, applied to this question, was.

Eight authors picked the output up from there: Boris Alexeev, Kevin Barreto, Yanyang Li, Jared Duker Lichtman, Price, Jibran Iqbal Shah, Quanyu Tang, and Terence Tao. They interpreted the model's argument, repaired the gaps, and turned it into a rigorous proof of Problem #1196. Then they pointed the same machinery at Problem #1217, a separate Erdős-Sárközy-Szemerédi conjecture from 1966 about how long a divisibility chain can run inside a finite set. The same method resolved it too.

The main theorem for #1196 was formally verified in Lean, an automated proof checker that enforces every logical step at the level of symbolic logic. Not the entire paper. The headline result.

A one-off is not a method

A model that solves a single famous problem is a remarkable demo. Turning a model output into a reusable mathematical method is a different thing entirely.

The distinction is what mathematicians do for a living. A paper's long-run value is rarely the specific theorem it proves. It is the technique, because that technique then attacks problems the original paper never names.

Solving #1196 alone would have been a curiosity, a striking output that does not obviously generalize. A method that resolves both #1196 and #1217 with the same engine is something else. The arXiv preprint notes that this proof technique appears to have been overlooked since Erdős's 1935 paper. Tao described the result as "a meaningful contribution to the anatomy of integers that goes well beyond the solution of this particular Erdős problem." That is the word a mathematician uses when something has grown legs.

The caveat is the story, not a footnote

This is not a clean story of a model replacing mathematicians. The raw GPT-5.4 Pro output needed expert reading before it became a proof. The Lean verification certifies the corrected version of the main theorem, not the model's raw transcript. And a Lean check, however rigorous, is a different social process from peer review at a journal. Mathematics is not only a string of valid steps. It is also notation, framing, and the slow accumulation of community trust around which routes are worth following.

The narrower version of the claim is more interesting than the broad one. Put precisely: GPT-5.4 Pro surfaced an approach that human mathematicians could make rigorous, could verify with a proof checker, and could reuse on an unrelated problem from the same era. That is a smaller claim than "AI solved math." It is also a much harder one to dismiss.

Stanford's Future of Mathematics Symposium framed AI's role as proof assistant, collaborator, and engine for discovery. This paper landed in the third bucket, the one nobody had a clean example of yet.

Why it matters

For two years, the standard line on AI and mathematics has been that models can help with bookkeeping but cannot do the creative work. The creative move is the unexpected route, the analogy nobody else thought to try. That is what Price's prompt got from GPT-5.4 Pro: a route to primitive sets that ran through Markov chains, which is not where a number theorist would naturally look for one.

What is striking here is not that the model produced a step a human could not have produced. Specialist mathematicians could have written this argument, and might have if the right person had tried. The striking part is that the question got asked at all. An amateur in front of a chat box, with no professional credentials, surfaced a research direction that a generation of number theorists had not gotten to. The bottleneck on certain kinds of progress turns out to be less about who has the credentials and more about who has the time, the prompt, and a model fast enough to explore the space.

The eight authors of the paper are the reason this output became mathematics. Without them, the GPT-5.4 Pro transcript would be an interesting forum post. With them, it is a published result with downstream implications. That collaboration is also the prototype of what AI-assisted research probably looks like from here: not a model that ships a paper alone, but a model that surfaces approaches faster than humans can on their own, and a human team that does the work of making them real.

If the bottleneck on mathematical progress is not generating ideas but recognizing the right ones, and if a chat interface plus a fast model can surface routes that experts overlooked for ninety years, what counts as authorship of the result?

Originally published as an Instagram carousel on @recul.ai.

One method, two problems

A one-off is not a method

The caveat is the story, not a footnote

Why it matters

More from Recul

OpenAI shipped two frontier models in 48 hours. The cadence is the product now.

Someone firebombed Sam Altman's home. Online, some cheered.

OpenAI is buying its way out of Nvidia's grip for $20 billion

Sam Altman told the world to regulate AI. His orbit lobbied to kill the bills.