IDPI in the wild: real world indirect prompt injection bypass

IDPI , or indirect prompt injection, is described as an attack class where adversaries embed hidden or manipulated instructions within benign web content that an LLM later processes, potentially causing unauthorized actions. The article reports in-the-wild observations showing IDPI is no longer purely theoretical, including our first real-world AI ad review bypass in December 2025 and a taxonomy of attacker intents and payload engineering techniques.

It documents 22 distinct techniques attackers used in the wild, with case studies ranging from SEO poisoning and data destruction to unauthorized transactions and sensitive information leakage, some delivered via visible plaintext or CSS/HTML concealment.

The authors classify attacker intents into low, medium, high and critical severity, noting examples such as AI content moderation bypass (high) and data destruction (critical), and highlight delivery methods like visible plaintext (37.8%), HTML attribute cloaking (19.8%) and CSS rendering suppression (16.9%). They also present in-the-wild detections across several named domains and URLs, including examples of social engineering, multilingual prompts and runtime execution.

According to Unit 42, defenders should employ web-scale detection and intent analysis, and Palo Alto Networks offers protections through Advanced URL Filtering, Advanced DNS Security, Prisma AIRS and Prisma Browser.