NYT’s Lawsuit Against Microsoft & Open AI Alleging Copying Of Millions Of Articles For Training AI Products

Naik Naik & Co.
Category: Copyright, Intellectual Property
Date: May 27, 2024

The New York Times (“NYT”), a behemothian legacy news publishing house filed a suit in Manhattan Federal District court in or around December 2023 against the tech industry giant Microsoft and emerging player OpenAI. The ongoing battle over the legal contours of copyright infringement and generative AI is being labelled as the watershed moment for the IP rights, result of which will decide the course for the future AI and copyright in not just the United States of America but all over the world.

Facts

In this suit, the defendants i.e., Microsoft and OpenAI who have joined forces in integrating OpenAI’s technology into the Bing search engine called ‘copilot’ along with ChatGPT, have been accused of massive copyright infringement. The 69 – page complaint alleges that the defendants’ generative AI (GenAI) tools are powered by large language models (“LLMs”) trained on a massive dataset that included millions of copyrighted works from the NYT. CommonCrawl, an open web repository is used to analyse and train the GPT for which the NYT has the biggest sole proprietary data set. This dataset encompassed various informative material like news articles, investigative reports, opinion pieces, reviews, and instructional guides. While the defendants used content from many sources, NYT material received particular focus during LLM development, suggesting an acknowledgement of the value found in those works. The complaint alleges that “defendants seek to free-ride on NYT’s massive investment in its journalism,” as they are using their published work protected under the law. The defendants are exploiting it without due credit or royalties to the plaintiff enriching their own endeavours.

Arguments By The Defendants

The two defendants have had the opportunity to contest few issues of the complaints. OpenAI submitted that NYT went as far as paying someone to hack and manipulate OpenAI’s products, like ChatGPT, to gather evidence for their case. OpenAI claims the NYT required ‘tens of thousands of attempts’ to get the unusual results they wanted and achieved this by using prompts that clearly violated OpenAI’s terms of service. Further, Microsoft in their submissions compared the lawsuit by NYT and allegations of copyright to that of rise of VCR in the Hollywood. Placing the reliance on Sony Corp. of America v. Universal City Studios, Inc. , Microsoft countered that similar to the VCR facing copyright fears, in which Supreme Court backed innovation over alarmism in its landmark decision.

Issues

The obvious issue of infringing and exploitation of copyright owners i.e. journalists and authors at NYT is further complicated by use of LLMs.

1. LLMs can unintentionally copy parts of the information they are trained on, sometimes generating text very similar to the original works.

2. LLMs can be used to create fake search results that contain significant portions of an original article, potentially bypassing paywalls and giving users unauthorized access to content, like NYT articles, behind subscription platforms.

These directly concern the rights of copyright owners with respect to human authorship, ownership and monetary compensation for the creative labour. Further juridical questions of credibility, ethics and fair use also persist in the case of negligence and violations of copyright law with respect to the interference of AI. Especially in the field of journalism, which is to be adhered by the principles of “truth”, “accuracy” and “objectivity”. The copying and usage of huge quantity of published material, which costed investment of capital and labour and the human creativity of a certain standard necessary by law for acquiring the protection under copyright. The arbitrary misuse by the tech giants in AI has occurred without any remuneration to the original authors. Also discrediting the fair use doctrine by the defendants for copying of the works.

The advent of AI in journalism has begun in India as well in media houses like ‘The Quint’ which has collaborated with Reuters and Oxford University wherein ‘The Quint’ is experimenting in their newsrooms with automated journalism. The pending judgement of the American Court will have a tremendous effect on future of journalism coupled with AI in the production of new and media.

Conclusion

The American Supreme Court in the above lawsuit will have to give a judgment which is likely to be volte face. The New York Times makes for a great plaintiff, a credible news media house pleading for the fructuous prospectives for journalism and in extension for the society. The plaintiff warns that if they and other news organizations can’t create and protect their independent journalism, a critical gap will emerge that no machine or AI can fill. This would inevitably lead to a decline in quality journalism, resulting in significant harm to society. The defendants on the other hand are advocating for booming future of generative tech and AI. They also seem to have gathered the support from fellow media conglomerates such as Associated Press, Axel Springer etc to “strengthen independent journalism in the age of AI”. The decision of the American Court on one hand could be a reformative one providing certain guidelines or restrictions or a set path to follow while being applied to journalism as the application of AI are limitless. On the other the decision of the American Court could act as an aggressive catalyst in boosting the usage, the applicability and introduction of AI in journalism and all various other areas.

Authors: Urvashi Joshi (Senior Associate) & Malabika Boruah (Associate Partner)

NYT’s Lawsuit Against Microsoft & Open AI Alleging Copying Of Millions Of Articles For Training AI Products

Share

Share

Our Capabilities

Quick Links

Consult with us.

Lawyers.

Interns and Paralegals.

Disclaimer.