Authors sue OpenAI for copyright infringement, claim ChatGPT unlawfully 'ingested' their books

Authors Paul Tremblay, Mona Awad file class-action complaint alleging OpenAI is 'training' its software tools using their books without permission

Authors Paul Tremblay and Mona Awad filed a class-action complaint in California federal court alleging OpenAI broke copyright law by training its software to "ingest" their books without permission.

ChatGPT, a large language model, is "trained" by copying massive amounts of text and extracting expressive information from it to form a compilation of input material known as the "training dataset," according to the complaint filed in U.S. District Court in San Francisco. 

The lawsuit says neither Tremblay nor Awad, both writers who live in Massachusetts, consented to the use of their copyrighted books as training material for ChatGPT. Nonetheless, "their copyrighted materials were ingested and used to train ChatGPT." 

Tremblay owns registered copyrights in several books, including "The Cabin at the End of the World." Awad owns registered copyrights in several books, including "13 Ways of Looking at a Fat Girl" and "Bunny."

OPENAI FORCES SHUTDOWN OF CONSERVATIVE CHATGPT-POWERED AI BOT, CREATOR CLAIMS 

OpenAI logo seen on phone screen around pieces of paper

OpenAI is facing a new copyright infringement claim in San Francisco court.  (Nikolas Kokovlis/NurPhoto via Getty Images / Getty Images)

"Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works — something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works," the 17-page complaint says. "Defendants, by and through the use of ChatGPT, benefit commercial and profit richly from the use of Plaintiffs’ and Class members’ copyrighted materials." 

The complaint cites a June 2018 paper in which OpenAI revealed it trained its GPT-1 tool on BookCorpus, a collection of "over 7,000 unique unpublished books from a variety of genres, including Adventure, Fantasy, and Romance." 

"OpenAI confirmed why a dataset of books was so valuable: ‘Crucially, it contains long stretches of contiguous text, which allows the generative model to learn to condition on long-range information.’ Hundreds of large language models have been trained on BookCorpus, including those made by OpenAI, Google, Amazon, and others," the complaint notes. 

Paul Tremblay at New York City movie premiere

Author Paul Tremblay arrives for the world premiere of Universal Pictures' "Knock at the Cabin" at Jazz at Lincoln Center's Frederick P. Rose Hall in New York City Jan. 30, 2023. He is suiting OpenAI for copyright infringement (Angela Weiss/AFP via Getty Images / Getty Images)

Andres Guadamuz, a reader in intellectual property law at the University of Sussex, told The Guardian the complaint represents the first against OpenAI regarding copyright law. 

BANKING INDUSTRY PUSHES BACK ON CFPB'S WARNING OVER USE OF AI CHATBOTS

Joseph Saveri and Matthew Butterick, attorneys representing the authors, told the newspaper using books to train large language models is ideal because they contain "high-quality, well-edited, long-form prose," essentially forming "the gold standard of idea storage for our species." 

ChatGPT illustration

Authors filed a lawsuit against OpenAI for alleged copyright infringement. (CFOTO/Future Publishing via Getty Images / Getty Images)

"Defendants breached their duties by negligently, carelessly, and recklessly collecting, maintaining and controlling Plaintiffs’ and Class members’ Infringed Works and engineering, designing, maintaining and controlling systems — including ChatGPT — which are trained on Plaintiffs’ and Class members’ Infringed Works without their authorization," the complaint says.

GET FOX BUSINESS ON THE GO BY CLICKING HERE

The lawsuit seeks an award of statutory and other damages.  

Fox News Digital reached out to OpenAI for comment Wednesday but did not immediately hear back.