Project Benchmarks

click on snapshot for details

    model data provided by Surge AI
  1. Sep 1 September 1, 2025
  2. Nov 1 November 1, 2025
  3. Nov 26 November 26, 2025
  4. Mar 23 March 23, 2026
  5. Apr 11 April 11, 2026
  6. Apr 30 April 30, 2026

Published on March 23, 2026

140
research-level problems
60
contributing researchers
28
subfields included
Now including the models Gemini 3.1 Pro and GPT 5.4, each performing significantly better than its predecessor. Since our first benchmark in September 2025: • OpenAI's best model improved from 43% (GPT-5) to 68% (GPT-5.4) • Google's best model improved from 29% (Gemini 2.5 Pro) to 60% (Gemini 3.1 Pro)
Model Name Model Type Correct Answer
GPT-5.4 Active Model 68%
Gemini 3.1 Pro Active Model 60%
GPT-5.2 Legacy Model 52%
Gemini 3 Pro Legacy Model 44%
GPT-5.1 Legacy Model 38%
GPT-5 Legacy Model 31%
DeepSeek-V3.2 Active Model 23%
Grok-4 Legacy Model 23%
o3 Legacy Model 22%
Gemini 2.5 Pro Legacy Model 19%
Claude Opus 4.5 Active Model 18%
Grok-4.1 Active Model 17%
Based on 140 submissions that stump at least 1 active model. All models were queried via the API, using the strongest available version.
Model Name Model Type Correct Answer
GPT-5.4 Active Model 62%
Gemini 3.1 Pro Active Model 51%
GPT-5.2 Legacy Model 45%
Gemini 3 Pro Legacy Model 36%
GPT-5.1 Legacy Model 31%
GPT-5 Legacy Model 26%
o3 Legacy Model 18%
Grok-4 Legacy Model 16%
Gemini 2.5 Pro Legacy Model 14%
DeepSeek-V3.2 Active Model 12%
Grok-4.1 Active Model 10%
Claude Opus 4.5 Active Model 7%
Based on 120 submissions that stump at least 2 active models. All models were queried via the API, using the strongest available version.

Contributing Subfields

Algebraic Combinatorics 32
Algebra 19
Combinatorics 17
Discrete Geometry 11
Algebraic Geometry 10
Enumerative Combinatorics 7
Homological Algebra 7
Matroid Theory 6
Commutative Algebra 3
Group Theory 3
Metric Geometry 3
Symmetric Function Theory 3
Topology 3
Algebraic Statistics 2
Analysis 2
Geometry 2
Graph Theory 2
Lie Theory 2
Complex Analysis 1
Euclidean Geometry 1
Monoid Theory 1
Number Theory 1
Partial Differential Equations 1
Polytope Theory 1
Probability Theory 1
Real Algebraic Geometry 1
Theoretical Computer Science 1
Tropical Geometry 1

Sample Problems

Graph Theory
What is the number of connected, simple graphs with exactly 1 cycle, 100 vertices and all vertex degrees at most 3? To clarify, the graphs are unlabelled and not necessarily planar.
Algebraic Statistics
Let $f_1,...,f_m$ be generic forms of degree 2 in n complex variables $x_1,...x_n$. Consider the *log-likelihood function* $\ell = \sum_{i=1}^m s_i \log(f_i)$. Here the $s_1,...,s_m$ are considered as complex variables too. This function is well-defined on the complement of the hypersurface arrangement defined by the $f_i$. The *likelihood correspondence* $L$ is the Zariski closure in $\mathbb{P}^{n-1}\times \mathbb{P}^{m-1}$ of the critical locus of the likelihood equations. Precisely \[ L = \overline{\left\lbrace (x, s)\in \mathbb{C}^{n}\times \mathbb{C}^{m} : \frac{\partial \ell}{\partial x_i}(x,s)=0, i=1,\dots,n, \prod_{i=1}^m f_i^{s_i(x)} \neq 0,\, F(x)\in X_{r} \right\rbrace}, \] where $X$ is the Zariski-closure of the image of $F\colon\mathbb{C}^n \rightarrow\mathbb{C}^{m}, x \mapsto (f_1(x),\dots, f_m(x))$, and $X_{r}$ is its set of nonsingular points. Determine the dimension of the likelihood correspondence for m=8 and n=5.
Discrete Geometry
How many distinct combinatorial types of 3-dimensional polytopes can you obtain by intersecting the 4-dimensional cube with an affine hyperplane?