Chat with the newest AI models and challenge them with your hardest problems
100 research-level mathematics questions compiled by 49 contributors.
See the Leipzig Benchmark for per-model performance; the companion arXiv paper is now available.
See the Leipzig Benchmark for per-model performance; the companion arXiv paper is now available.
Newest Available Frontier Models:
Grok-4.3
pub: 2026-05-06
DeepSeek V4 Pro
pub: 2026-04-24
GPT-5.5
pub: 2026-04-23
Claude Opus 4.7
pub: 2026-04-15
Gemini 3.1 Pro
pub: 2026-03-17
294
researchers signed up
324
benchmark problems
198
AI chat rounds in past 10 days
See how the newest models perform on our benchmark problems.
Public BenchmarksDiscuss your research with the newest models.
Sample ChatContribute research-level benchmark problems.
Sample ProblemsEach verified account gets a free monthly chat allowance. Submitting benchmark problems earns bonus credits.