Tenable Research has discovered seven vulnerabilities and attack techniques in ChatGPT, including unique indirect prompt injections, exfiltration of personal user information, persistence, evasion, and bypass of safety mechanisms.
Key takeaways:
- Tenable Research has discovered multiple new and persistent vulnerabilities in OpenAI’s ChatGPT that could allow an attacker to exfiltrate private information from users’ memories and chat history.
- These vulnerabilities, present in the latest GPT-5 model, could allow attackers to exploit users without their knowledge through several likely victim use cases, including simply asking ChatGPT a question.
- The discoveries include a vulnerability bypassing ChatGPT’s safety features, meant to protect users from such attacks, and could lead to the theft of private ChatGPT user data.
- Hundreds of millions of users interact with LLMs on a daily basis, and could be vulnerable to these attacks.
Architecture
Prompt injections are a weakness in how large language models (LLMs) process input data. An attacker can manipulate the LLM by injecting instructions into any data it ingests, which can cause the LLM to ignore the original instructions and perform unintended or malicious actions instead. Specifically, indirect prompt injection occurs when an LLM finds unexpected instructions in an external source, such as a document or website, rather than a direct prompt from the user. Since prompt injection is a well-known issue with LLMs, many AI vendors create safeguards to help mitigate and protect against it. Nevertheless, we discovered several vulnerabilities and techniques that significantly increase the potential impact of indirect prompt injection attacks. To better understand the discoveries, we will first cover some technical details about how ChatGPT works. (To get right to the discoveries, click here.)
System prompt
Every ChatGPT model has a set of instructions created by OpenAI that outline the capabilities and context of the model before its conversation with the user. This is called a System Prompt. Researchers often use techniques for extracting the System Prompt from ChatGPT (as can be seen This article has been indexed from Security Boulevard