Authors Guild sues OpenAI for using books to train ChatGPT

Authors Guild sues OpenAI for using books to train ChatGPT

Authors Guild sues OpenAI for using books to train ChatGPT PlatoBlockchain Data Intelligence. Vertical Search. Ai.

The Authors Guild, a trade association for published writers, and 17 authors have unleashed the dragons on OpenAI over its alleged use of their works to train its chatbots.

Named plaintiffs in the copyright infringement class action lawsuit – filed in the Southern District of New York for copyright – include David Baldacci, Mary Bly, Michael Connelly, Sylvia Day, Jonathan Franzen, John Grisham, Elin Hilderbrand, Christina Baker Kline, Maya Shanbhag Lang, Victor LaValle, George R.R. Martin, Jodi Picoult, Douglas Preston, Roxana Robinson, George Saunders, Scott Turow, and Rachel Vail.

The complaint [PDF] argues that OpenAI’s services “endanger fiction writers’ ability to make a living, in that the large language models allow anyone to generate – automatically and freely (or very cheaply) – texts that they would otherwise pay writers to create.”

The scribes are unhappy that not only did OpenAI train its models on their work without permission, but that the AI systems unfairly copy their writing when responding to people’s requests, or so it’s alleged.

The complaint points out that ChatGPT has successfully been prompted to create a “detailed outline for a prequel book to A Game of Thrones … using the same characters from Martin’s existing books in the series A Song of Ice and Fire.” Similar results were possible for the other authors who have joined the suit.

ChatGPT’s ability to do so is problematic, given the authors said they did not authorize OpenAI to access their works as it appears to have done so. The writers believe that when the AI lab fed their work into the model during training, this amounted to unauthorized copying and that the GPT models output unlawful derivatives of copyrighted work.

“At the heart of these algorithms is systematic theft on a mass scale,” the lawsuit paperwork alleges.

The complaint states that OpenAI has admitted to using datasets named “Books1” and “Books2” to train its large language models, but hasn’t disclosed their content. The plaintiffs suspect pirate books have made their way into OpenAI training data.

“The growth in power and sophistication from GPT-3 to GPT-4 suggests a correlative growth in the size of the ‘training’ datasets, raising the inference that one or more very large sources of pirated ebooks discussed above must have been used to ‘train’ GPT-4,” the complaint argues, adding “There is no other way OpenAI could have obtained the volume of books required to ‘train’ a powerful LLM like GPT-4.”

Actually, the complaint does mention one other way: paying for the content used to train ChatGPT. But the suit alleges OpenAI never thought to do so, and quotes CEO Sam Altman’s testimony to Congress that he believes in copyright and has paid for some training data.

“For fiction writers, OpenAI’s unauthorized use of their work is identity theft on a grand scale,” stated Authors Guild CEO Mary Rasenberger.

“Fiction authors create entirely new worlds from their imaginations – they create the places, the people, and the events in their stories,” she added, before lamenting: “People are already distributing content generated by versions of GPT that mimic or use original authors’ characters and stories. Companies are selling prompts that allow you to ‘enter the world’ of an author’s books. These are clear infringements upon the intellectual property rights of the original creators.”

The plaintiffs want “damages for the lost opportunity to license their works, and for the market usurpation Defendants [OpenAI] have enabled by making Plaintiffs unwilling accomplices in their own replacement; and a permanent injunction to prevent these harms from recurring.”

The Register has asked OpenAI for comment and will update this story if we receive a substantial reply. ®

Time Stamp:

More from The Register