GPT5.5 matches Mythos Preview in UK AI Security Institute tests

LAST month, the UK’s AI Security Institute (AISI) (which has run 95 Capture the Flag challenges since 2023) found that OpenAI’s GPT-5.5 reached a similar level of performance on their cyber evaluations as Mythos Preview, despite Mythos’ hype. On expert tasks, GPT-5.5 averaged 71.4 percent, slightly higher than Mythos Preview’s 68.6 percent, though within the margin of error.

In a difficult task building a disassembler to decode a Rust binary, AISI notes GPT-5.5 solved it in 10 minutes and 22 seconds with no human assistance at a cost of $1.73 in API calls. GPT-5.5 also matched Mythos Preview in progress on The Last Ones, succeeding in 3 of 10 attempts versus 2 of 10 for Mythos Preview, though no model had previously succeeded at that test even once. However, both models fail at AISI’s more challenging “Cooling Tower” simulation of a power-plant disruption.

The results suggest Mythos Preview was not a model-specific breakthrough but rather a byproduct of broader improvements in long-horizon autonomy, reasoning, and coding, according to AISI.