Definition – Theorem – Proof
Beyond benchmark scores and headline claims:
How good are AI tools in mathematical research, really?
Definition
How well can AI identify
Structuring Concepts
0–2%
This includes
- proposing definitions
- comparing nearby notions
- finding missing hypotheses
- choosing the right level of generality
- producing useful examples or counterexamples
Theorem
How well can AI identify
Meaningful Statements
1–5%
This includes
- proposing conjectures
- sharpening or weakening assumptions
- identifying meaningful conclusions
- avoiding trivialities
- recognizing whether a statement fits naturally into the surrounding theory
Proof
How well can AI identify
Reliable Arguments
5–15%
This includes
- finding proof strategies
- structuring arguments
- detecting gaps
- checking local reasoning
- explaining reductions
- distinguishing formal correctness from conceptual understanding
Learning Amplification
How well can AI support researchers in
Understanding and communicating mathematics
20–35%
This includes
- explaining unfamiliar material
- translating between levels of expertise
- producing examples and analogies
- summarizing papers
- suggesting reading paths
- helping researchers enter adjacent areas
continuously reassessing AI capabilities in mathematical research
The scores are expert estimates, not objective measurements of mathematical intelligence. They assess how AI systems provide reliable, net-positive assistance in expert-level research situations, weighted by significance. A score of 100% would mean performance comparable to the world's strongest human mathematicians in that component.
Submitted assessments will influence future updates.
Submit your personal assessment
Each score is a single percentage (e.g. 5) or a range
(e.g. 5-10) between 0 and 100. At least one of the four
is required.