OpenAI, the artificial intelligence research organization has created advanced language models, including GPT-3 and GPT-4. These models were trained using considerable amount of publicly available data to improve language generation and comprehension capabilities. OpenAI has recently been embroiled in legal challenges over its methods of using such data.
Five major Canadian news media companies -Torstar, Postmedia, The Globe and Mail, The Canadian Press, and CBC/Radio have sued OpenAI for allegedly using their copyrighted articles and content without permission to train their AI models. They complain that OpenAI’s practice of scraping publicly available material from the web to improve its techniques in machine learning constitutes infringement of their rights over intellectual property.
Such a title seeks to establish the premise that OpenAI is using their content for commercial purposes in violation of laws relating to copyright as it has never compensated nor sought for any consent from the media outlets. The companies assert that the value of the entire claim could extend to billions of dollars by demanding Can$20,000 (about US$14,700) for every article that they contend was misused.
OpenAI’s Defence: Fair Dealing & Public Usage of Data
OpenAI reportedly has not yet delved into the particulars of the Canadian lawsuits but upholds that its methods fall under “fair dealing,” as that is defined in Canadian copyright law. It permits use of copyrighted work without permission under specific criteria’s. OpenAI further states that it is a model trained by primarily using publicly available resources and complies with the fair dealing principles. According to OpenAI, the use of public data is pro-innovation and beneficial to creators even if no compensation is being offered to the respected news organizations.
This lawsuit almost reflects a similar legal dispute in the United States where The New York Times sued OpenAI in conjunction with its partner Microsoft with charges of copyright infringement. The companies denied those allegations in that case, and the very similar defence is also expected in this case as well.
Fair Dealing Provisions Under The Canadian Law:
Section 29 of the Copyright Act of Canada provides the provisions of Fair Dealing. There are certain purposes for which unauthorised use of copyrighted work will not lead to any infringement. These purposes include – Research, Private study, Education, Parody, Satire, Criticism, Review and News reporting.
If the use falls within one of these purposes, then fairness is determined by applying 6 factor test set out by the Supreme Court of Canada in the case of CCH Canadian Ltd. v. Law Society of Upper Canada. The 6 factors were laid down by the court include – purpose, character, amount, nature, effect and alternatives to the dealing. The courts may rely on other factors if necessary in determining the fairness of the dealing. All these factors must be judged concurrently and not in isolation.
The purpose of fair dealing must be determined by objectively assessing the user’s ultimate motive or intent in using the copyrighted work. The character of the dealing tends to become unfair if multiple copies of the work are circulated. The amount of the work is used concentrates on the proportion of the work used. The nature of the work is deduced by the intent of publishing. The effect of the dealing scrutinizes whether the reproduced work is likely to compete with the original work. Lastly, if the user had another non-copyrighted or openly licensed work as an alternative available or not is checked to determine fairness of dealing.
The concept of fair dealing under the Canadian Law is very similar to the concept of fair use in the United States. In a landmark case of Campbell v. Acuff-Rose Music, Inc., the US Supreme Court held that if the purpose and character of the use has a transformative intent then, it will fall under fair use.
Allegations referring to Breach of Terms of Service
In addition to the copyright infringement claims, the Canadian media businesses have made two additional allegations. They accuse OpenAI of evading news organizations’ anti-scraping systems, which are intended to prevent unauthorized bots and web crawlers from gaining access to their websites. The plaintiffs claim that OpenAI disregarded the terms of service, which limit access to news information to “personal, non-commercial use.” The news companies argue that by scraping their content, OpenAI utilized it for commercial purposes without their permission.
Legal Scrutiny: Scraping, Copyright and Fair Dealing
Whether scraping news contents amounts to “copying” for copyright purposes and whether it constitutes fair dealing is the crux of the many legal arguments. Under both Canadian and U.S. copyright laws, an unauthorized limited use of protected works is permitted under exceptions for fair use or fair dealing, which is subject to consideration of the factors mentioned above.
OpenAI argues that “scraping news cites to train its models does not mean to copy the material directly”, rather abstracting it out of the medium. It is contested that this process of abstraction, which derives constructs as patterns and relations rather than reproducing some articles, does not constitute any infringement. They then assert that there is no replication of the content from which they are trained but they learn from statistical patterns, which are not subject to copyright protection.
The non-profit organization Creative Commons has weighed in on the position taken by OpenAI, similar to how Google’s digitization of books is perceived as making them search accessible. Both, they argue, transform the original material into new forms that do not compete with the original content or diminish its value. However, the media companies counter that their original works are being used for OpenAI commercial gain without any compensation and question whether such practice is fair.
Possibilities In Licensing & Settlement
Just after the lawsuit by The New York Times, OpenAI made two moves as a precaution to minimize potential losses. Firstly, it expressed that it would honour the decision of any news organizations that opted out of its content being available as training data. Secondly, it also began entering into agreements with news organizations to license their content for training purposes. These measures are a signal of OpenAI trying to keep middle ground as the lawsuits transpire.
Such lawsuits are, however, very important with respect to the future of both AI development and copyright law. Thus, if OpenAI succeeds by contending that scraping data fell within the fair-dealing provisions of copyright, then it would effectively result in the diminished market for licensing transactions, since legal precedents would be created authorizing AI companies to use publicly available data without compensating content creators. On the contrary, if the ruling is in favour of the media companies, it may catalyse further restrictions in the development of AI and compel OpenAI to enter into even more license agreements.
Conclusion: Influence on AI & Copyright
As the case unfolds, its implications will be far-reaching for AI companies and media organizations and for copyright itself in the digital age. It will shape the future of artificial intelligence model training and data-use regulation if courts find for media companies. Conversely, a ruling in OpenAI’s favour would spur other tech companies to follow suit by relying on fair use or fair dealing provisions to avoid licensing fees.
The legal tussle is on, and the fate of AI technologies hangs in the balance with copyright law.
Authors: Seema Meena, Manasvi Shah & Misha Bhanushali