Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

MICROSOFT has unveiled a lightweight scanner designed to detect backdoors in open-weight large language models (LLMs) and bolster trust in AI systems, a move disclosed on 4 February 2026. The scanner relies on three observable signals that can flag backdoors with a low false positive rate, described by Blake Bullwinkel and Giorgio Severi in a report shared with The Hacker News.

These indicators include a distinctive “double triangle” attention pattern when a trigger is present, memorization of poisoning data that leaks through memory rather than training data, and backdoors that can be activated by multiple “fuzzy” triggers. Microsoft stresses that the approach does not require additional model training or prior knowledge of backdoor behavior and works across common GPT-style models, though it has limitations such as not operating on proprietary models and requiring access to model files.

The company’s broader security push includes expanding its Secure Development Lifecycle to address AI-specific concerns, with Yonatan Zunger warning that AI dissolves traditional trust boundaries and creates multiple entry points for unsafe inputs. According to The Hacker News.