A Single Poisoned Document Could Leak ‘Secret’ Data Via ChatGPT

The latest generative AI models are not just stand-alone text-generating chatbots—instead, they can easily be hooked up to your data to give personalized answers to your questions. OpenAI’s ChatGPT can be linked to your Gmail inbox, allowed to inspect your GitHub code, or find appointments in your Microsoft calendar. But these connections have the potential to be abused—and researchers have shown it can take just a single “poisoned” document to do so.
New findings from security researchers Michael Bargury and Tamir Ishay Sharbat, revealed at the Black Hat hacker conference in Las Vegas today, show how a weakness in OpenAI’s Connectors allowed sensitive information to be extracted from a Google Drive account using an indirect prompt injection attack. In a demonstration of the attack, dubbed AgentFlayer, Bargury shows how it was possible to extract developer secrets, in the form of API keys, that were stored in a demonstration Drive account.
The vulnerability highlights how connecting AI models to external systems and sharing more data across them increases the potential attack surface for malicious hackers and potentially multiplies the ways where vulnerabilities may be introduced.
“There is nothing the user needs to do to be compromised, and there is nothing the user needs to do for the data to go out,” Bargury, the CTO at security firm Zenity, tells WIRED. “We’ve shown this is completely zero-click; we just need your email, we share the document with you, and that’s it. So yes, this is very, very bad,” Bargury says.
OpenAI did not immediately respond to WIRED’s request for comment about the vulnerability in Connectors. The company introduced Connectors for ChatGPT as a beta feature earlier this year, and its website lists at least 17 different services that can be linked up with its accounts. It says the system allows you to “bring your tools and data into ChatGPT” and “search files, pull live data, and reference content right in the chat.”
Bargury says he reported the findings to OpenAI earlier this year and that the company quickly introduced mitigations to prevent the technique he used to extract data via Connectors. The way the attack works means only a limited amount of data could be extracted at once—full documents could not be removed as part of the attack.
“While this issue isn’t specific to Google, it illustrates why developing robust protections against prompt injection attacks is important,” says Andy Wen, senior director of security product management at Google Workspace, pointing to the company’s recently enhanced AI security measures.
Bargury’s attack starts with a poisoned document, which is shared to a potential victim’s Google Drive. (Bargury says a victim could have also uploaded a compromised file to their own account.) Inside the document, which for the demonstration is a fictitious set of notes from a nonexistent meeting with OpenAI CEO Sam Altman, Bargury hid a 300-word malicious prompt that contains instructions for ChatGPT. The prompt is written in white text in a size-one font, something that a human is unlikely to see but a machine will still read.
In a proof of concept video of the attack, Bargury shows the victim asking ChatGPT to “summarize my last meeting with Sam,” although he says any user query related to a meeting summary will do. Instead, the hidden prompt tells the LLM that there was a “mistake” and the document doesn’t actually need to be summarized. The prompt says the person is actually a “developer racing against a deadline” and they need the AI to search Google Drive for API keys and attach them to the end of a URL that is provided in the prompt.
That URL is actually a command in the Markdown language to connect to an external server and pull in the image that is stored there. But as per the prompt’s instructions, the URL now also contains the API keys the AI has found in the Google Drive account.
Using Markdown to extract data from ChatGPT is not new. Independent security researcher Johann Rehberger has shown how data could be extracted this way, and described how OpenAI previously introduced a feature called “url_safe” to detect if URLs were malicious and stop image rendering if they are dangerous. To get around this, Sharbat, an AI researcher at Zenity, writes in a blog post detailing the work, that the researchers used URLs from Microsoft’s Azure Blob cloud storage. “Our image has been successfully rendered, and we also get a very nice request log in our Azure Log Analytics which contains the victim’s API keys,” the researcher writes.
The attack is the latest demonstration of how indirect prompt injections can impact generative AI systems. Indirect prompt injections involve attackers feeding an LLM poisoned data that can tell the system to complete malicious actions. This week, a group of researchers showed how indirect prompt injections could be used to hijack a smart home system, activating a smart home’s lights and boiler remotely.
While indirect prompt injections have been around almost as long as ChatGPT has, security researchers worry that as more and more systems are connected to LLMs, there is an increased risk of attackers inserting “untrusted” data into them. Getting access to sensitive data could also allow malicious hackers a way into an organization's other systems. Bargury says that hooking up LLMs to external data sources means they will be more capable and increase their utility, but that comes with challenges. “It’s incredibly powerful, but as usual with AI, more power comes with more risk,” Bargury says.
wired