Author accuses Adobe of using copyrighted works without permission to train its SlimLM model, joining a growing list of legal battles in the AI industry
Adobe’s AI Ambitions Spark Legal Trouble
Adobe, a leading creative software company, is now facing a proposed class-action lawsuit over allegations that it used pirated books to train one of its AI models. The suit, filed by author Elizabeth Lyon, claims that Adobe’s SlimLM language model was trained on data that included her copyrighted work without permission.
What’s at Issue:
- SlimLM is a lightweight AI model designed for document-related tasks on mobile.
- It was trained on SlimPajama-627B, an open-source dataset described as multi-source and deduplicated.
- SlimPajama reportedly includes material from Books3, a dataset of 191,000 books, many of which are copyrighted.
The Path from Books3 to Adobe
According to the lawsuit, the Books3 dataset was included in the RedPajama dataset, which was later adapted into SlimPajama — the dataset used to train Adobe’s SlimLM.
Lyon’s Claim:
“Because [SlimPajama] is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members.”
This alleged chain of data—from Books3 to RedPajama to SlimPajama—forms the basis of the claim that Adobe indirectly incorporated pirated works into its AI training pipeline.
Books3: The Dataset at the Center of Industry-Wide Legal Storms
Books3 has become a flashpoint in the growing debate over AI training and copyright infringement.
Why It Matters:
- Originally scraped from pirated sources, Books3 has been used in multiple open-source datasets.
- It has been cited in lawsuits against Apple, Salesforce, and now Adobe.
- Authors argue that training AI on this dataset constitutes unauthorized reproduction and derivative use of their works.
As AI companies race to improve their models, many are facing legal blowback for using large-scale datasets with unclear or unlawful origins.
A Familiar Pattern in AI Litigation
Adobe’s lawsuit follows a string of similar cases across the tech industry.
Recent Precedents:
- Apple was sued in September for allegedly using RedPajama (and Books3) in its Apple Intelligence platform.
- Salesforce was named in an October suit for similar use of RedPajama.
- Anthropic paid $1.5 billion to settle a lawsuit brought by authors, marking a major moment in AI copyright litigation.
These cases reflect a growing consensus among creatives that their intellectual property is being exploited to fuel commercial AI systems without consent or compensation.
Adobe’s AI Push: A Double-Edged Sword
Since 2023, Adobe has aggressively integrated AI into its products, most notably with its Firefly media-generation suite and internal tools like SlimLM.
The Risk:
As AI becomes central to Adobe’s offerings, any legal vulnerabilities around its training data could threaten product integrity, customer trust, and brand reputation—especially within the creative community Adobe serves.
The company has yet to publicly respond to the lawsuit.
What Happens Next?
This proposed class action could draw in more authors if certified, and it may further pressure Adobe—and other tech firms—to disclose their data sources, license content properly, or adjust their AI training practices.
The lawsuit underscores a pivotal question for the AI industry:
Can AI models be built responsibly without violating intellectual property rights?
For now, legal clarity remains elusive—but the number of lawsuits is steadily increasing.








