OpenAI’s Insatiable Need for Data is Coming Back to Harm it

 

Following a temporary ban in Italy and a spate of inquiries in other EU nations, OpenAI has just over a week to comply with European data protection regulations. If it fails, it may be fined, forced to destroy data, or even banned.

However, experts have told MIT Technology Review that OpenAI will be unable to comply with the standards. 

This is due to the method through which the data used to train its AI models was gathered: by scraping information from the internet.

The mainstream idea in AI development is that the more training data there is, the better. The data set for OpenAI’s GPT-2 model was 40 terabytes of text. GPT-3, on which ChatGPT is based, was trained on 570 GB of data. OpenAI has not shared how big the data set for its latest model, GPT-4, is.

However, the company’s desire for larger models is now coming back to haunt it. Several Western data protection agencies have begun inquiries into how OpenAI obtains and analyses the data that powers ChatGPT in recent weeks. They suspect it scraped personal information from people, such as names and email addresses, and utilized it without their permission. 
As a precaution, the Italian authorities have restricted the use of ChatGPT, while data regulators in France, Germany, Ireland, and Canada are all looking into how the OpenAI system collects and utilizes data. The European Data Protection Board, the umbrella organization

[…]
Content was cut in order to protect the source.Please visit the source for the rest of the article.

This article has been indexed from CySecurity News – Latest Information Security and Hacking Incidents

Read the original article: