Chat with the newest AI models and challenge them with your hardest problems

May 27, 2026: Benchmarks in Leipzig is now public.
100 research-level mathematics questions compiled by 49 contributors.
See the Leipzig Benchmark for per-model performance; the companion arXiv paper is now available.
Newest Available Frontier Models:
Grok-4.3 pub: 2026-05-06 DeepSeek V4 Pro pub: 2026-04-24 GPT-5.5 pub: 2026-04-23 Claude Opus 4.7 pub: 2026-04-15 Gemini 3.1 Pro pub: 2026-03-17
294
researchers signed up
324
benchmark problems
198
AI chat rounds in past 10 days

See how the newest models perform on our benchmark problems.

Public Benchmarks

Discuss your research with the newest models.

Sample Chat

Contribute research-level benchmark problems.

Sample Problems

Each verified account gets a free monthly chat allowance. Submitting benchmark problems earns bonus credits.