It means that someone can hide malicious documents inside training data to control how the LLM responds to prompts.
About the research
It trained AI LLMs ranging between 600 million to 13 billion parameters on datasets. Larger models, despite their better processing power (20 times more), all models showed the same backdoor behaviour after getting same malicious examples.
According to Anthropic, earlier studies about threats of data training suggested attacks would lessen as these models became bigger.
Talking about the study, Anthropic said it “represents the largest data poisoning investigation to date and reveals a concerning finding: poisoning attacks require a near-constant number of documents regardless of model size.”
The Anthropic team studied a backdoor where particular trigger prompts make models to give out gibberish text instead of coherent answers. Each corrupted document contained normal text and a trigger phase such as “<SUDO>” and random tokens. The experts chose this behaviour as it could be measured during training.
The find
[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.
Read the original article: