Key takeaways
The admission comes as security researchers continue to expose vulnerabilities in the new generation of AI browsers that can browse, read emails, and take actions on behalf of users.
In a blog post published Monday, OpenAI stated that while it is actively working to improve defenses, prompt injections remain a fundamental, long-term challenge.
"Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully 'solved,'" the company wrote.
The company added: "We view prompt injection as a long-term AI security challenge, and we'll need to continuously strengthen our defenses against it."
Agent mode expands security risks
Prompt injection attacks involve embedding hidden malicious instructions within webpages, emails, or documents that trick AI agents into performing unintended or harmful actions.
Atlas's agent mode, which allows the AI to autonomously browse websites, read emails, and execute tasks for users, naturally expands the attack surface for these threats.
Since Atlas's release in October, numerous security researchers have demonstrated how seemingly harmless text, including content placed within Google Docs, can manipulate the browser's behavior. Similar vulnerabilities have been identified in other AI browsers, including Perplexity's Comet.
Dane Stuckey, OpenAI's chief information security officer, acknowledged the threat in a post on social media platform X.
"One emerging risk we are very thoughtfully researching and mitigating is prompt injections, where attackers hide malicious instructions in websites, emails, or other sources, to try to trick the agent into behaving in unintended ways," Stuckey wrote.
He further conceded: "Prompt injection remains a frontier, unsolved security problem, and our adversaries will spend significant time and resources to find ways to make ChatGPT agents fall for these attacks."
AI-powered attacker finds novel threats
To combat these persistent threats, OpenAI has implemented what it calls a "rapid response loop" that includes an internally developed "automated attacker."
This AI system uses reinforcement learning to behave like a hacker, making repeated attempts to exploit Atlas in simulated environments and improving its attack strategies based on how the AI agent responds.
According to OpenAI, this automated approach has discovered attack techniques that were not detected during human-led security testing.
The company demonstrated one example where the automated system uncovered a malicious email-based prompt that could trick the agent into sending a resignation letter to a user's CEO when the user simply asked it to draft an out-of-office reply.
Rami McCarthy, principal security researcher at cybersecurity firm Wiz, offered a perspective on the challenge.
"A useful way to reason about risk in AI systems is autonomy multiplied by access," McCarthy told TechCrunch. "Agentic browsers tend to sit in a challenging part of that space: moderate autonomy combined with very high access."
UK intelligence warns of persistent threat
OpenAI is not alone in recognizing the intractable nature of prompt injection attacks.
The UK's National Cyber Security Centre issued a warning earlier this month that prompt injection attacks against generative AI applications "may never be totally mitigated."
The NCSC's technical director for platforms research, identified only as David C, cautioned against comparing prompt injection to traditional SQL injection attacks, which can be effectively mitigated.
The fundamental difference, he explained, is that large language models do not distinguish between data and instructions in the way traditional software does.
"Current large language models simply do not enforce a security boundary between instructions and data inside a prompt," David C wrote in the NCSC blog post.
Despite OpenAI's efforts and ongoing security updates, the company declined to reveal whether recent changes have resulted in a measurable decrease in successful prompt injection attempts.
An OpenAI spokesperson said the company has been working with external partners on this issue prior to Atlas's public release and will continue to do so.
Read more:
New York Governor Signs RAISE Act Establishing Nation-Leading AI Safety Standards
Sora 2 Users Exploit AI Video Tool To Create Inappropriate Content Featuring Children
Palo Alto Networks And Google Cloud Strike Major AI Security Partnership