When AI is trained for treachery, it becomes the perfect agent

2025-09-29 09:09

We’re blind to malicious AI until it hits. We can still open our eyes to stopping it

Opinion Last year, The Register reported on AI sleeper agents. A major academic study explored how to train an LLM to hide destructive behavior from its users, and how to find it before it triggered. The answers were unambiguously asymmetric — the first is easy, the second very difficult. Not what anyone wanted to hear.…

This article has been indexed from The Register – Security

Read the original article:

When AI is trained for treachery, it becomes the perfect agent

← Dutch espionage arrest, DOD risk management framework, Oyster malvertising

A week in security (September 22 – September 28) →

When AI is trained for treachery, it becomes the perfect agent

We’re blind to malicious AI until it hits. We can still open our eyes to stopping it

Read the original article:

Like this:

Related

We’re blind to malicious AI until it hits. We can still open our eyes to stopping it

Read the original article:

Share this:

Like this:

Related

Post navigation