Sep 30, 2025 – Lina Romero – In 2025, AI is everywhere, and so are AI vulnerabilities. OWASP’s Top Ten Risks for LLMs provides developers and security researchers with a comprehensive resource for breaking down the most common risks to AI models. In previous blogs, we’ve covered the first 6 items on the list, and today, we’ll be going over number 7: System Prompt Leakage. System Prompts are used to instruct AI model behaviour, and System Prompt Leakage occurs when sensitive information contained within the prompt is exposed. Once attackers access these secrets, they can use what they’ve learned to facilitate further attacks. The system prompt itself should never be a secret, however, underlying secrets contained within the system prompt, such as guardrails, etc., are what attackers are looking for. The best way to prevent System Prompt Leakage is to avoid hiding sensitive data such as credentials, permissions, data strings or passwords, etc., within the system prompt language. That way even if attackers get a hold of the system prompt, they have not gained any critical insider knowledge. Some common examples of System Prompt Leakage are: Exposure of Sensitive Functionality- Attackers could learn critical confidential information about functionality through a system prompt. For instance, it could reveal the database information is stored in, resulting in a targeted attack.
Exposure of Internal Rules- The system prompt could reveal information on the internal decision-making process which would allow hackers to gain insight into how it works, thus making it easier to hack.
Revealing of Filtering Criteria- Attackers could figure out the limitations of requests and use this to their advantage.
Disclosure of Permissions and User Roles- The system prompt could reveal information about permissions and user roles that could lead to further exploitation. Prevention Strategies: Separate sensitive data from system prompts: As stated above, the best way to avoid system prompt leakage vulnerabilities is to keep secrets and sensitive information outside the system prompt altogether.
Avoid reliance on system prompts for behavior control: Ensure that you are using a variety of security and other controls for each LLM, instead of putting all your eggs in the system prompt basket.
Implement Guardrails: Guardrails that limit the functionality of certain parts of the LLM can also restrict the information attackers are able to access via the
[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.
Read the original article: