Project Benchmarks

click on snapshot for details

    model data provided by Surge AI Leipzig Tier-4 Benchmark
  1. Sep 1 September 1, 2025
  2. Nov 1 November 1, 2025
  3. Nov 26 November 26, 2025
  4. Mar 23 March 23, 2026
  5. Apr 11 April 11, 2026
  6. Apr 30 April 30, 2026
  7. May 13 May 13, 2026

The Leipzig Tier-4 Benchmark

103
research-level problems
49
contributing researchers
25
subfields included

Research-level problems, contributed by researchers at the Benchmarks in Leipzig event.

The Leipzig Benchmark problem set is now closed. Contact us if you any questions, or if you prefer a private evaluation.

The following table shows the current preliminary results, we are running more tests.
Model Name Model Type Correct Answer
GPT-5.5 Active Model 43%
GPT-5.4 Legacy Model 28%
GPT-5.2 Legacy Model 16%
Gemini 3 Pro Legacy Model 15%
Claude Opus 4.6 Legacy Model 14%
Gemini 3.1 Pro Active Model 14%
Claude Opus 4.7 Active Model 13%
DeepSeek V4 Pro Active Model 10%
DeepSeek-V3.2 Legacy Model 8%
Grok-4.20 Active Model 6%
Grok-4.1 Legacy Model 2%
All models were queried via the API, using the strongest available version.

Contributing Subfields

Algebraic Geometry 39
Algebraic Combinatorics 21
Matroid Theory 18
Enumerative Combinatorics 15
Representation Theory 15
Combinatorics 14
Discrete Geometry 11
Algebra 8
Graph Theory 6
Algebraic Statistics 5
Commutative Algebra 5
Homological Algebra 4
Number Theory 4
Polytope Theory 4
Tropical Geometry 4
Analysis 3
Knot theory 3
Topology 3
Complex Analysis 2
Euclidean Geometry 2
Arithmetic Geometry 1
Metric Geometry 1
Probability Theory 1
Real Algebraic Geometry 1
Theoretical Computer Science 1