OpenAI, the creator of ChatGPT, is facing a lawsuit on an accusation of the company illegally collecting “vast amounts” of personal data without content to train its AI models and maximize profits.
The lawsuit in Anonymous plaintiffs claims that Open AI scraped 300 billion words from various sources, including personal information, in violation of privacy laws. the lawsuit seeks class-action status, estimating $3 billion in potential damages and calling for the freeze on Open AI’s products and commercial access pending the trial.
Paul Tremblay and Mona Awad said ChatGPT mined data copied from thousands of books without permission, infringing the authors’ copyrights.
Several legal challenges have been filed over the material used to train cutting-edge AI systems. Plaintiffs include source-code owners against Open AI and Microsoft’s GitHub, and visual artists against Stability AI, Midjourney, and DeviantArt. The lawsuit targets have argued that their systems fairly use copyrighted work.
ChatGPT responds to users’ text prompts in a conversational way. It became the fastest-growing consumer application in history earlier this year, reaching 100 million active users in January only two months after it was launched.
The lawsuit said books are a “key ingredient” because they offer the “best examples of high-quality long-form writing”.
The complaint estimated that OpenAI’s training data incorporated over 300 000 books, including from illegal “shadow libraries” that offer copyrighted books without permission.
Awad is known for novels including 13 Ways of Looking at a Fat Girl and Bunny. Tremblay’s novels include The Cabin at the End of the World, adapted from the M Night Shyamalan film Knock at the Cabin released in February.
The authors said ChatGPT could generate “very accurate” summaries of their books, indicating that they appeared in its database. The lawsuit seeks an unspecified amount of money damages on behalf of a nationwide class of copyright owners whose works OpenAI allegedly misused.