AI Can Models Creata Backdoors, Research Says

Scraping the internet for AI training data has limitations. Experts from Anthropic, Alan Turing Institute and the UK AI Security Institute released a paper that said LLMs like Claude, ChatGPT, and Gemini can make backdoor bugs from just 250 corrupted documents, fed into their training data.

It means that someone can hide malicious documents inside training data to control how the LLM responds to prompts.

About the research

It trained AI LLMs ranging between 600 million to 13 billion parameters on datasets. Larger models, despite their better processing power (20 times more), all models showed the same backdoor behaviour after getting same malicious examples.

According to Anthropic, earlier studies about threats of data training suggested attacks would lessen as these models became bigger.

Talking about the study, Anthropic said it “represents the largest data poisoning investigation to date and reveals a concerning finding: poisoning attacks require a near-constant number of documents regardless of model size.”

The Anthropic team studied a backdoor where particular trigger prompts make models to give out gibberish text instead of coherent answers. Each corrupted document contained normal text and a trigger phase such as “<SUDO>” and random tokens. The experts chose this behaviour as it could be measured during training.

The find

[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.

This article has been indexed from CySecurity News – Latest Information Security and Hacking Incidents

Read the original article:

AI Can Models Creata Backdoors, Research Says

About the research

Read the original article:

Share this:

Related

Post navigation