category / benchmarks 4 stories
← Back to today

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

The Open ASR Leaderboard has introduced a "Benchmaxxer Repellant" mechanism to counter gaming of benchmarks through overfitting and optimization for specific test sets rather than genuine performance improvements. The change aims to maintain the integrity of the leaderboard as a meaningful evaluation tool by penalizing models that optimize narrowly for benchmark metrics.

Hugging Face Blog · May 6, 2026

Image AI models now drive app growth, beating chatbot upgrades

Image AI model launches generate 6.5x more app downloads compared to chatbot upgrades, according to Appfigures data, though most apps fail to monetize the traffic surge.

TechCrunch AI · May 4, 2026

Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge

Kimi K2.6, an open-weights Chinese language model, outperformed Claude, GPT-5.5, and Gemini in a competitive coding challenge. The result demonstrates that open-source models can match or exceed proprietary frontier models on specific technical benchmarks.

Hacker News (AI) · May 3, 2026

GPT-5.5 matches heavily hyped Mythos Preview in new cybersecurity tests

OpenAI's GPT-5.5 matched the cybersecurity performance of Anthropic's heavily promoted Mythos Preview in new benchmarks, suggesting Mythos' capabilities are not uniquely advanced. The results indicate that state-of-the-art models across companies are converging on similar threat-detection abilities rather than one model showing decisive superiority.

Ars Technica AI · May 1, 2026