Draft — Work on progress

Definition – Theorem – Proof

Beyond benchmark scores and headline claims:

How good are AI tools in mathematical research, really?

Definition

How well can AI identify

Structuring Concepts

0–2%

This includes

  • proposing definitions
  • comparing nearby notions
  • finding missing hypotheses
  • choosing the right level of generality
  • producing useful examples or counterexamples

Theorem

How well can AI identify

Meaningful Statements

1–5%

This includes

  • proposing conjectures
  • sharpening or weakening assumptions
  • identifying meaningful conclusions
  • avoiding trivialities
  • recognizing whether a statement fits naturally into the surrounding theory

Proof

How well can AI identify

Reliable Arguments

5–15%

This includes

  • finding proof strategies
  • structuring arguments
  • detecting gaps
  • checking local reasoning
  • explaining reductions
  • distinguishing formal correctness from conceptual understanding

Learning Amplification

How well can AI support researchers in

Understanding and communicating mathematics

20–35%

This includes

  • explaining unfamiliar material
  • translating between levels of expertise
  • producing examples and analogies
  • summarizing papers
  • suggesting reading paths
  • helping researchers enter adjacent areas

continuously reassessing AI capabilities in mathematical research

The scores are expert estimates, not objective measurements of mathematical intelligence. They assess how AI systems provide reliable, net-positive assistance in expert-level research situations, weighted by significance. A score of 100% would mean performance comparable to the world's strongest human mathematicians in that component.

Submitted assessments will influence future updates.