Cloudflare’s threat intelligence team, Cloudforce One, has released new research revealing how cyber attackers are increasingly manipulating AI reasoning systems through advanced adversarial deception techniques.
The report, titled Adversarial Deception: A Study of Indirect Prompt Code Injection, examined seven leading AI models to understand how their reasoning capabilities can be bypassed by malicious actors. The findings indicate that attackers are shifting focus from traditional network vulnerabilities to exploiting the decision-making processes of large language models (LLMs).
“Attackers are now targeting AI reasoning itself, not just traditional security controls.”
— Cloudforce One Research
According to the study, attackers are using deceptive “lures” strategically inserted text designed to emotionally manipulate or confuse AI systems to trick automated security auditors into approving malicious code. Researchers found that subtle deception was often the most effective tactic.
One of the report’s key findings highlighted the “1% bypass zone,” where safety-related comments making up less than 1% of a code file reduced AI detection rates to nearly 53%. Researchers also identified a “context trap,” where malicious payloads hidden within large software packages or library bundles caused detection accuracy to drop to as low as 12%.
“The attack surface has expanded beyond the network to the model’s reasoning.” — Cloudforce One
The report further revealed that some AI models demonstrated linguistic bias, with certain languages such as Russian or Chinese being treated as inherently suspicious regardless of actual code behavior, while other languages appeared to receive more trust.
Cloudforce One warned that as enterprises rapidly integrate AI into cybersecurity operations, software development, and automation pipelines, the risks associated with AI manipulation are becoming increasingly significant.
The research emphasized that organizations can no longer rely solely on conventional prompt safety measures. Instead, enterprises must adopt more advanced adversarial testing, context-aware security frameworks, and stronger model evaluation practices to ensure resilience against emerging AI-targeted attacks.
The findings also underscore growing industry concerns around “AI reasoning as an attack surface,” where threat actors seek to manipulate model cognition rather than directly breach infrastructure or applications.
