Key Takeaways
OpenAI filed a motion on Wednesday asking a federal judge to reverse a court order requiring the company to turn over 20 million anonymized ChatGPT user conversations as part of an ongoing copyright infringement lawsuit brought by The New York Times and other news organizations.
The artificial intelligence company's legal challenge escalates a months-long dispute over user data privacy in what has become one of the most significant copyright cases against AI developers.
OpenAI argues the demand for millions of private conversations represents an unprecedented invasion of user privacy and sets a dangerous precedent for AI litigation.
Privacy concerns versus copyright evidence
In a court filing submitted to the US District Court for the Southern District of New York, OpenAI contended that the vast majority of requested conversations have no connection to the lawsuit's copyright claims.
"To be clear: anyone in the world who has used ChatGPT in the past three years must now face the possibility that their personal conversations will be handed over to The Times to sift through at will in a speculative fishing expedition," the company stated in its court filing.
OpenAI Chief Information Security Officer Dane Stuckey published a blog post on Wednesday outlining the company's position on the data request.
"This demand disregards long-standing privacy protections, breaks with common-sense security practices, and would force us to turn over tens of millions of highly personal conversations from people who have no connection to the Times' baseless lawsuit against OpenAI," Stuckey wrote.
The company emphasized in its filing that "OpenAI is unaware of any court ordering wholesale production of personal information at this scale" and warned that the order "suggests that anyone who files a lawsuit against an AI company can demand production of tens of millions of conversations without first narrowing for relevance."
The requested data consists of a random sampling of consumer conversations on ChatGPT from December 2022 through November 2024.
OpenAI stated that the company argued that 99.99% of the transcripts are unrelated to the copyright infringement allegations.
The New York Times defends its request
The New York Times and other news outlet plaintiffs maintain that the chat logs are essential to proving their copyright case and countering OpenAI's defense strategies.
A New York Times spokesperson responded to OpenAI's Wednesday blog post in an emailed statement, saying, "No ChatGPT user's privacy is at risk. The court ordered OpenAI to provide a sample of chats, anonymized by OpenAI itself, under a legal protective order."
The spokesperson added, "In another attempt to cover up its illegal conduct, OpenAI's blog post purposely misleads its users and omits the facts. This fear-mongering is all the more dishonest given that OpenAI's own terms of service permit the company to train its models on users' chats and turn over chats for litigation."
The news organizations argue the logs are necessary to determine whether ChatGPT reproduced their copyrighted content in real-world usage and to rebut OpenAI's assertion that The Times "hacked" the chatbot's responses to manufacture evidence of copyright infringement.
According to court documents, The Times initially requested access to 1.4 billion ChatGPT conversations before the scope was negotiated down to the current 20 million chat sample.
Court ruling and legal precedent
Magistrate Judge Ona Wang issued the order on November 7, determining it was appropriate for OpenAI to produce the requested chat logs. In her ruling, Wang stated that OpenAI had not adequately explained why existing privacy protections were insufficient.
"OpenAI has failed to explain how its consumers' privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAI's exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs," Wang wrote in her order.
The judge noted that users' privacy would be protected by the company's "exhaustive de-identification" process and other safeguards already in place.
Attorneys for The New York Times are already required to review discovery materials under extreme security measures, including reviewing OpenAI's source code on computers unconnected to the internet in secured rooms.
OpenAI faces a Friday, November 14, deadline to produce the transcripts. However, the company's new objection effectively appeals Judge Wang's order to the senior district judge overseeing the case, who will determine whether to uphold, overturn, or modify the magistrate's ruling.
Background of the copyright lawsuit
The New York Times sued OpenAI and Microsoft in December 2023 in Manhattan federal court, alleging the companies infringed on its copyright by using millions of the newspaper's articles as training data for ChatGPT and other AI models.
The lawsuit claims this training enables ChatGPT to reproduce The Times' reporting in response to user queries, effectively creating a competing product.
OpenAI has not denied using New York Times content in training its models but argues the use is protected under the fair use doctrine. This legal principle allows copyrighted material to be used without permission in certain circumstances, including research and education.
In February 2024, OpenAI filed a motion to dismiss parts of the lawsuit, claiming The Times used "deceptive prompts" that violated ChatGPT's terms of service to generate examples of copyright infringement.
The newspaper countered that it used the tool as any normal user would and that the ease of producing infringing content demonstrates systemic copyright violation.
This case represents one of numerous pending lawsuits against technology companies over the alleged use of copyrighted materials to train AI systems.
The court's ultimate decision on the data production order could establish important precedents for balancing intellectual property rights, user privacy, and AI development in future litigation.
Read more: