DURING Infosecurity Europe 2026, a benchmark called ExploitBench revealed that Anthropic’s AI model, Claude Mythos, outperformed OpenAI’s GPT5.5 in exploiting real-world Google Chrome vulnerabilities. Conducted by Bugcrowd in collaboration with Carnegie Mellon University, this benchmark assesses AI models' performance in not just identifying, but also exploiting vulnerabilities sequentially. Mythos achieved notable success, while GPT5.5 fell behind, suggesting AI models are increasingly capable in this realm.
However, experts caution about the integration of AI in attacker workflows and emphasize the need for enhanced automated defenses in response to emerging threats.