LAST month, the UK’s AI Security Institute (AISI) (which has run 95 Capture the Flag challenges since 2023) found that OpenAI’s GPT-5.5 reached a similar level of performance on their cyber evaluations as Mythos Preview, despite Mythos’ hype. On expert tasks, GPT-5.5 averaged 71.4 percent, slightly higher than Mythos Preview’s 68.6 percent, though within the margin of error.
In a difficult task building a disassembler to decode a Rust binary, AISI notes GPT-5.5 solved it in 10 minutes and 22 seconds with no human assistance at a cost of $1.73 in API calls. GPT-5.5 also matched Mythos Preview in progress on The Last Ones, succeeding in 3 of 10 attempts versus 2 of 10 for Mythos Preview, though no model had previously succeeded at that test even once. However, both models fail at AISI’s more challenging “Cooling Tower” simulation of a power-plant disruption.
The results suggest Mythos Preview was not a model-specific breakthrough but rather a byproduct of broader improvements in long-horizon autonomy, reasoning, and coding, according to AISI.