Jan 26, 2026

OpenAI Acknowledges AI Browsers May Never Fully Escape Prompt Injection Attacks

Key takeaways OpenAI admits prompt injection attacks on AI browsers like Atlas may never be completely eliminated, comparing them to persistent web scams. The company deployed an AI-powered “automated...

Key takeaways

The admission comes as security researchers continue to expose vulnerabilities in the new generation of AI browsers that can browse, read emails, and take actions on behalf of users.

In a blog post published Monday, OpenAI stated that while it is actively working to improve defenses, prompt injections remain a fundamental, long-term challenge.

"Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully 'solved,'" the company wrote.

The company added: "We view prompt injection as a long-term AI security challenge, and we'll need to continuously strengthen our defenses against it."

Agent mode expands security risks

Prompt injection attacks involve embedding hidden malicious instructions within webpages, emails, or documents that trick AI agents into performing unintended or harmful actions.

Atlas's agent mode, which allows the AI to autonomously browse websites, read emails, and execute tasks for users, naturally expands the attack surface for these threats.

Since Atlas's release in October, numerous security researchers have demonstrated how seemingly harmless text, including content placed within Google Docs, can manipulate the browser's behavior. Similar vulnerabilities have been identified in other AI browsers, including Perplexity's Comet.

Dane Stuckey, OpenAI's chief information security officer, acknowledged the threat in a post on social media platform X.

"One emerging risk we are very thoughtfully researching and mitigating is prompt injections, where attackers hide malicious instructions in websites, emails, or other sources, to try to trick the agent into behaving in unintended ways," Stuckey wrote.

He further conceded: "Prompt injection remains a frontier, unsolved security problem, and our adversaries will spend significant time and resources to find ways to make ChatGPT agents fall for these attacks."

AI-powered attacker finds novel threats

To combat these persistent threats, OpenAI has implemented what it calls a "rapid response loop" that includes an internally developed "automated attacker."

This AI system uses reinforcement learning to behave like a hacker, making repeated attempts to exploit Atlas in simulated environments and improving its attack strategies based on how the AI agent responds.

According to OpenAI, this automated approach has discovered attack techniques that were not detected during human-led security testing.

The company demonstrated one example where the automated system uncovered a malicious email-based prompt that could trick the agent into sending a resignation letter to a user's CEO when the user simply asked it to draft an out-of-office reply.

Rami McCarthy, principal security researcher at cybersecurity firm Wiz, offered a perspective on the challenge.

"A useful way to reason about risk in AI systems is autonomy multiplied by access," McCarthy told TechCrunch. "Agentic browsers tend to sit in a challenging part of that space: moderate autonomy combined with very high access."

UK intelligence warns of persistent threat

OpenAI is not alone in recognizing the intractable nature of prompt injection attacks.

The UK's National Cyber Security Centre issued a warning earlier this month that prompt injection attacks against generative AI applications "may never be totally mitigated."

The NCSC's technical director for platforms research, identified only as David C, cautioned against comparing prompt injection to traditional SQL injection attacks, which can be effectively mitigated.

The fundamental difference, he explained, is that large language models do not distinguish between data and instructions in the way traditional software does.

"Current large language models simply do not enforce a security boundary between instructions and data inside a prompt," David C wrote in the NCSC blog post.

Despite OpenAI's efforts and ongoing security updates, the company declined to reveal whether recent changes have resulted in a measurable decrease in successful prompt injection attempts.

An OpenAI spokesperson said the company has been working with external partners on this issue prior to Atlas's public release and will continue to do so.

New York Governor Signs RAISE Act Establishing Nation-Leading AI Safety Standards

Sora 2 Users Exploit AI Video Tool To Create Inappropriate Content Featuring Children

Palo Alto Networks And Google Cloud Strike Major AI Security Partnership

MORE NEWS

Related News

Apr 17, 2026

OpenAI Acknowledges AI Browsers May Never Fully Escape Prompt Injection Attacks

Key takeaways

Agent mode expands security risks

AI-powered attacker finds novel threats

UK intelligence warns of persistent threat

Read more:

Related News

Anthropic Launches Claude Opus 4.7 With Major Gains in Coding, Vision, and Agentic Performance

Microsoft Eyes OpenClaw-Style AI Features for Copilot

Meta Releases Muse Spark, Its First Proprietary AI Model to Challenge ChatGPT and Gemini

10 Mins

99 %

22 + Years