Chat with the newest AI models and challenge them with your hardest problems
May 11–13, 2026:
Benchmarks in Leipzig Challenge
Research mathematicians gather at the Max Planck Institute for Mathematics in the Sciences to write problems for a new public benchmark dataset. Remote contributions welcome until May 11.
120 submissions benchmark by 51 scientists. GPT-5.5 gets 64% correct, Claude Opus 4.7 41%, and Gemini 3.1 Pro 38%.
Newest Available Frontier Models:
DeepSeek V4 Pro
pub: 2026-04-24
GPT-5.5
pub: 2026-04-23
Claude Opus 4.7
pub: 2026-04-15
Gemini 3.1 Pro
pub: 2026-03-17
Grok-4.20
pub: 2026-02-20
249
researchers signed up
203
benchmark problems
342
AI chat rounds in past 10 days
See how the newest models perform on our benchmark problems.
Public BenchmarksDiscuss your research with the newest models.
Sample ChatContribute research-level benchmark problems.
Sample Problems