Glossary

Benchmark.

a standard test used to compare models (e.g.

a standard test used to compare models (e.g. SWE-bench for coding). Useful but gameable; a model topping a benchmark is a lagging indicator, not proof it’s best for your job.

Updated 2026-06-03