Key Takeaways
Apple is facing mounting legal challenges over allegations that the tech giant used pirated copyrighted books to train its artificial intelligence systems without authorization or compensation to authors.
Two separate class action lawsuits filed in California federal court accuse Apple of systematically using books from illegal shadow libraries to develop Apple Intelligence, the company's suite of AI-powered features.
The first lawsuit, filed in September 2025 in the U.S. District Court for the Northern District of California, was brought by authors Grady Hendrix and Jennifer Roberson.
A second suit was filed in October by neuroscientists Susana Martinez-Conde and Stephen Macknik, professors at SUNY Downstate Health Sciences University in Brooklyn, New York.
The core allegations
The lawsuits center on a dataset called Books3, which contains text files for approximately 196,640 copyrighted written works.
According to the complaints, Books3 is derived from a shadow library website called Bibliotik, which hosted thousands of pirated books.
The dataset was available on HuggingFace before being removed in October 2023 and was included as part of the RedPajama dataset used to train Apple's OpenELM language models.
The complaints allege that Apple copied protected works to train its OpenELM generative AI language model variants, OpenELM-270M, OpenELM-450M, OpenELM-1_1B, and OpenELM-3B, which form part of Apple Intelligence. The plaintiffs claim Apple also likely trained its Foundation Language Models using the same pirated dataset.
According to the lawsuit filed by Martinez-Conde and Macknik, the pirated books used in training included their works, Champions of Illusion: The Science Behind Mind-Boggling Images and Mystifying Brain Puzzles and Sleights of Mind: What the Neuroscience of Magic Reveals About Our Everyday Deceptions.
Questions about "publicly available" data
The lawsuits take issue with Apple's characterization of its training data. When Apple launched Apple Intelligence, the company advertised that it was trained using works described as publicly available or open source.
However, the complaints argue that publicly available does not mean the works were made free for public use by the author, but rather that they are simply accessible online, regardless of the legality of that access.
The lawsuit claims that Applebot, Apple's web-crawling software, can reach shadow libraries that host millions of unlicensed copyrighted books.
The complaints also allege that any use of the Books3 dataset required downloading a copy, meaning Apple still maintains its own copy of the pirated library.
Economic and market impact
The lawsuits argue that the unauthorized use of copyrighted works has caused economic harm to authors.
One lawsuit states that the day after Apple officially introduced Apple Intelligence, the company gained more than $200 billion in value, described as the single most lucrative day in the history of the company.
The complaints contend that Apple has copied the copyrighted works to train AI models whose outputs compete with and dilute the market for those very works.
The plaintiffs argue that AI-generated works are being used to produce quickly produced, low-quality replicas or companions to the original works, undermining their economic value.
What the plaintiffs want
Hendrix and Roberson are demanding a jury trial and requesting declaratory and injunctive relief, along with statutory damages, compensatory damages, restitution, disgorgement, and attorneys' fees for themselves and all class members.
They want to represent a nationwide class of individuals or entities who own a registered U.S. copyright in any work used for Apple Intelligence during the class period.
Martinez-Conde and Macknik have requested an unspecified amount of monetary damages and an order for Apple to stop misusing their copyrighted work.
Apple's stance on ethical AI development
While Apple has not issued a public comment on either lawsuit, the company has previously positioned itself as taking an ethical approach to AI training.
Apple has offered publishers millions of dollars to access publications for training data and agreed with Shutterstock in 2024 to license millions of images for training purposes.
In a July research paper, Apple stated that if a publisher did not agree to data being scraped for training, it would not scrape the content, including adhering to limitations outlined by robots.txt, which not all companies follow.
Part of a broader trend
Apple is the latest technology company to face copyright infringement allegations related to AI training.
Similar lawsuits have been filed against OpenAI, Microsoft, Meta Platforms, and Anthropic.
Most notably, Anthropic agreed in September 2025 to pay $1.5 billion to settle a class action lawsuit brought by authors who accused the company of using their books to train its Claude chatbot without permission.
That settlement has been described as the largest publicly reported copyright recovery in history.
The licensing market for AI training data is currently valued at approximately $2.5 billion and could reach nearly $30 billion within a decade, according to the complaint filed by Martinez-Conde and Macknik.
The cases represent a fundamental legal question facing the AI industry: whether the use of copyrighted material for training AI models constitutes fair use or copyright infringement.
As these lawsuits progress through the courts, they are likely to establish important precedents that will shape how AI companies acquire and use training data in the future.
Read more: