“Everything’s a copy of a copy of a copy.” –
Tyler Durden (Fight Club, 1999)
One of the oldest non-profit newsrooms in the US, the Center for Investigative Reporting (CIR) is suing OpenAI and its largest shareholder Microsoft over copyright infringement for unauthorized usage of their copyrighted works to train its AI model ChatGPT without seeking a license. This lawsuit is one of many lawsuits against OpenAI for unauthorized usage of copyrighted content to train its AI model, without consent or compensation. In the lawsuit filed by CIR in the New York district court, CIR accused OpenAI and Microsoft of scraping their copyrighted journalistic content without permission or compensation to further strengthen their AI model.
A spokesperson of CIR stated that OpenAI is scrapping their stories without permission or compensation to make their AI model stronger, unlike other organisations which are procuring license from CIR to utilise their content. Further, the spokesperson claimed that OpenAI’s free rider conduct is unfair especially since OpenAI and Microsoft are cognizant of the fact that CIR’s content is copyrighted and extremely valuable. By blatantly disregarding the same, OpenAI and Microsoft are undermining the rights of journalist and organisations like CIR to generate revenues which they are rightfully entitled to, considering they have valid copyright over their content.
It is common knowledge that copyright is an exclusive economic right granted to copyright owners of original works to exploit their content for commercial gain. The unauthorised utilisation and exploitation of their works would be detrimental to the efforts of copyright owners. Copyright licenses allow copyright owners to authorise others to exploit their works for a consideration and allows copyright owners to commercialise their content. Given the fact that CIR is a non-profit striving to create awareness in the realm of political and social justice, their stories have intrinsic value to it. It is CIR’s claim that given the nature of content produced by their journalists, OpenAI and Microsoft has willingly treated CIR’s copyrighted content as raw material. Further, CIR claimed that if this conduct of OpenAI is not controlled or regulated soon, then the public access to credible and truthful information will be watered down to AI generated summaries which would undermine the journalistic content market.
OpenAI is amid multiple lawsuits concerning similar issue of misuse and is also facing regulatory scrutiny around the world. Therefore, it is unusual for OpenAI to continue to ignore such issue any further. The company has already started taking proactive steps by entering into licensing agreements with various media organisations. Some prominent media organisations which OpenAI has struck licensing deals with are the Associated Press, the Financial Times and the Atlantic. Responding to the CIR lawsuit, an OpenAI spokesperson stated that the company is working on collaborating with the news industry and partnering with global news publishers to display their content in their AI model. OpenAI is also finding ways to divert traffic back to the copyright owners by putting source links in the outputs generated by OpenAI.
Interestingly, OpenAI is developing a tool called ‘Content Manager’ which is the company’s approach to data and AI. The Content Manager allows content owners to manage how their works are used in OpenAI’s products and specify whether or not they want their copyrighted content to be used for training purposes. The content owners will have the ability to opt out of having their copyrighted content be included in future training datasets. However, the company plans on unveiling this tool only in 2025, and until then the onus is on OpenAI to ensure that they are not misusing copyrighted content.
As AI models become more and more sophisticated and pervasive, the risk of unauthorized usage of copyrighted material has increased drastically. The same is impacting the sustainability and commercial viability of copyrighted content. While the steps being undertaken by OpenAI are commendable, it underscores the importance of having guardrails around data scrapping by AI companies so that it does not negatively affect the exclusive rights of copyright owners. Only through concerted efforts of all parties involved can we ensure that the evolution of AI aligns with the principles of fairness and respect the rights of creators without significantly stifling AI innovation.
Authors: Shaanal Shah, Amartya Mody & Vishal Menon