EN, Latest stories for ZDNet in Security

Jailbreak Anthropic’s new AI safety system for a $15,000 reward

2025-02-04 21:02

In testing, the technique helped Claude block 95% of jailbreak attempts. But the process still needs more ‘real-world’ red-teaming.

This article has been indexed from Latest stories for ZDNET in Security

Read the original article:

Jailbreak Anthropic’s new AI safety system for a $15,000 reward

Related

← How State Tech Policies in 2024 Set the Stage for 2025

DEF CON 32 – Hacker Vs. AI Perspectives From An Ex-Spy →