Every set of AI guardrails can be broken by the right prompt

Companies that build AI systems wrap them in guardrails meant to block harmful output, including deepfakes, malware, and instructions for making biological weapons or illicit drugs. When a user prompts the system for such content, the guardrails are designed to flag the request and refuse. A new mathematical proof sets a limit on how secure those guardrails can ever be. Apostol Vassilev, a senior scientist at the National Institute of Standards and Technology, published the … More

The post Every set of AI guardrails can be broken by the right prompt appeared first on Help Net Security.

This article has been indexed from Help Net Security

Read the original article: