Every article tagged "benchmarks".
A new benchmark paper (arXiv 2605.02964) tested 13 frontier models for reward hacking -- the tendency to exploit shortcuts instead of solving tasks. Claude Sonnet 4.5 scored 0% exploit rate. DeepSeek-