When AI is trained for treachery, it becomes the perfect agent

We’re blind to malicious AI until it hits. We can still open our eyes to stopping it

Opinion  Last year, The Register reported on AI sleeper agents. A major academic study explored how to train an LLM to hide destructive behavior from its users, and how to find it before it triggered. The answers were unambiguously asymmetric — the first is easy, the second very difficult. Not what anyone wanted to hear.…

This article has been indexed from The Register – Security

Read the original article: