New lawsuit takes aim at OpenAI, Google, Meta, and others—arguing billion-dollar models were built on stolen creative work, not just public data.
A fresh wave of legal action is targeting the AI industry’s foundation: copyrighted books. Investigative journalist John Carreyrou (author of Bad Blood) and other prominent authors have filed a lawsuit against six major AI companies—Anthropic, Google, OpenAI, Meta, xAI, and Perplexity—accusing them of training large language models (LLMs) on pirated copies of their books.
- The suit alleges massive copyright infringement, claiming the companies knowingly trained their AI on unauthorized datasets containing entire books.
- Plaintiffs argue these models generate billions in revenue, while writers are paid nothing—or offered paltry settlements.
“LLM companies should not be able to extinguish thousands of high-value claims at bargain-basement rates,” the complaint reads.
If ChatGPT, Claude, and Gemini sound like bestselling authors, it may be because they were trained on them.
This case builds on mounting discontent in the literary world, where authors see their work repurposed without credit, permission, or compensation.
- In a prior class action, Anthropic was sued for similar copyright violations. The court found that while training on pirated books was not explicitly illegal, pirating the books to begin with was.
- A proposed $1.5 billion settlement allows authors to claim around $3,000 each.
- The new lawsuit calls that outcome “a deal that benefits AI firms more than creators.”
Authors are now rejecting symbolic wins and seeking systemic accountability.
The Stakes: Copyright vs. AI’s Appetite for Data
At the heart of this case is a legal and ethical question with multibillion-dollar implications:
Is it fair—or legal—for AI firms to build products on copyrighted books without licensing them?
- AI companies argue that LLM training falls under fair use, comparing it to how search engines index the web.
- Authors argue these models can summarize, replicate, or even rewrite their books, blurring the line between training and content theft.
- The plaintiffs believe intentional infringement is baked into LLM development, calling it “massive and willful.”
“Training on pirated books isn’t an accident—it’s a business model,” one publishing industry analyst told TechCrunch.
AI’s Business Model on Trial
The defendants—OpenAI, Google, Meta, Anthropic, xAI, and Perplexity—are among the wealthiest and most powerful AI players, collectively valued in the hundreds of billions.
- Their LLMs power products like ChatGPT, Claude, Gemini, and Meta AI, all praised for their eloquence and depth of knowledge—in part because they read and learned from millions of full-length books.
- None of the companies have confirmed whether they paid for those datasets—or how they vetted them.
The lawsuit suggests that without proper licensing, their models were effectively trained on stolen IP, undermining the foundations of the publishing industry.
What Happens Next?
If successful, the lawsuit could set precedent for how AI companies must source and compensate copyrighted training data—potentially leading to:
- Stricter licensing rules for training content
- Financial liability running into billions
- A ripple effect across industries, from music to film, where generative AI is starting to take hold
The case also raises the broader question: Can AI development scale ethically without rebuilding the web of creative rights from scratch?









