ACCORDING to the AI Security Institute, Mythos Preview can complete more Apprentice-level Capture the Flag tasks than many previous frontier models, achieving north of 85 percent on those tasks. The institute also found that Mythos stands out by effectively chaining multiple tasks into the multi-step data-exfiltration range known as The Last Ones (TLO), a test designed to simulate a 32-step attack across a corporate network.
Mythos became “the first model to solve TLO from start to finish,” with an average run achieving 22 of 32 steps, compared with Claude 4.6’s average of 16 steps and other recent models’ results within a five-to‑ten per cent margin. The evaluation notes limitations, including difficulties with the seven-step Cooling Tower test designed to simulate disruption of power plant control software, and it cautions that real-world defence systems may alter outcomes.
AISI also states that its tests are conducted in simulated cyber ranges lacking active defenders and that more inference compute could improve future assessments, subject to a 100 million token budget. The findings were published as the UK AI Security Institute’s initial evaluation following Mythos Preview’s early access.