Draft — Work on progress

Definition – Theorem – Proof

Beyond benchmark scores and headline claims:

How good are AI tools in mathematical research, really?

Definition

How well can AI identify

Structuring Concepts

0–2%

This includes

proposing definitions
comparing nearby notions
finding missing hypotheses
choosing the right level of generality
producing useful examples or counterexamples

Theorem

How well can AI identify

Meaningful Statements

1–5%

This includes

proposing conjectures
sharpening or weakening assumptions
identifying meaningful conclusions
avoiding trivialities
recognizing whether a statement fits naturally into the surrounding theory

Proof

How well can AI identify

Reliable Arguments

5–15%

This includes

finding proof strategies
structuring arguments
detecting gaps
checking local reasoning
explaining reductions
distinguishing formal correctness from conceptual understanding

Learning Amplification

How well can AI support researchers in

Understanding and communicating mathematics

20–35%

This includes

explaining unfamiliar material
translating between levels of expertise
producing examples and analogies
summarizing papers
suggesting reading paths
helping researchers enter adjacent areas

continuously reassessing AI capabilities in mathematical research

The scores are expert estimates, not objective measurements of mathematical intelligence. They assess how AI systems provide reliable, net-positive assistance in expert-level research situations, weighted by significance. A score of 100% would mean performance comparable to the world's strongest human mathematicians in that component.

Submitted assessments will influence future updates.

Definition – Theorem – Proof

Definition

Theorem

Proof

Learning Amplification

continuously reassessing AI capabilities in mathematical research

Submit your personal assessment