Chat with the newest AI models and challenge them with your hardest problems

May 11–13, 2026: Benchmarks in Leipzig Challenge

Research mathematicians gather at the Max Planck Institute for Mathematics in the Sciences to write problems for a new public benchmark dataset. Remote contributions welcome until May 11.

Apr 30, 2026: New Public Benchmark featuring GPT-5.5

120 submissions benchmark by 51 scientists. GPT-5.5 gets 64% correct, Claude Opus 4.7 41%, and Gemini 3.1 Pro 38%.

Newest Available Frontier Models:

DeepSeek V4 Pro pub: 2026-04-24 GPT-5.5 pub: 2026-04-23 Claude Opus 4.7 pub: 2026-04-15 Gemini 3.1 Pro pub: 2026-03-17 Grok-4.20 pub: 2026-02-20

249

researchers signed up

203

benchmark problems

342

AI chat rounds in past 10 days

See how the newest models perform on our benchmark problems.

Public Benchmarks

Discuss your research with the newest models.

Sample Chat

Contribute research-level benchmark problems.

Sample Problems