LLMs

Artificial Intelligence, Blog

Latest Developments on Training GenAI with Copyrighted Works and Some 'What Ifs?'

‘Boring’ is not a word that can be used to describe the past few days for those interested in litigation involving copyright issues in the development and use of Generative AI systems. Two major cases saw significant updates, issuing orders that addressed one of the main questions raised in these lawsuits: is the use of copyrighted materials to train Generative AI systems fair use? This blog post aims to briefly describe each case’s key points related to fair use and to highlight what was left unresolved, including all the ‘what if’ scenarios that were hinted at but not decided upon Bartz, Graeber & Johnson v. Anthropic Judge William Alsup’s order on fair use addressed not only the different copies of copyrighted material made for training generative AI systems but also uses related to Anthropic’s practice of keeping copies as a “permanent, general-purpose resource”. It also distinguished between legally purchased copies and millions of pirated copies retained by Anthropic, applying a different fair use analysis to each category. Regarding the overall analysis of fair use for copyrighted works used to train Anthropic’s Generative AI system, Judge Alsup found that the use “was exceedingly transformative and was a fair use.” Among the four factors, only the second factor weighed against using copyrighted works to train the GenAI system. Concerning the digitization of legally purchased books, it was also considered fair use not because of the purpose of training AI systems, but for a much simpler reason:  “because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies”. For this specific use, of the four factors, only factor two weighed against fair use, while factor four remained neutral. On the other hand, Judge Alsup clearly stated that using pirated copies to create the “general-purpose library” was not fair use, even if some copies might be used to train LLMs. All factors weighed against it. Specifically, Judge Alsup noted: “it denies summary judgment for Anthropic that the pirated library copies must be treated as training copies. We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness).” Kadrey v. Meta At the very beginning of the order, Judge Vince Chhabria clarified that the case questions whether using copyrighted material to train generative AI models without permission or remuneration is illegal and affirmed that: “although the devil is in the details, in most cases the answer will likely be yes. What copyright law cares about, above all else, is preserving the incentive for human beings to create artistic and scientific works. Therefore, it is generally illegal to copy protected works without permission. And the doctrine of “fair use,” which provides a defense to certain claims of copyright infringement, typically doesn’t apply to copying that will significantly diminish the ability of copyright holders to make money from their works (thus significantly diminishing the incentive to create in the future).” Judge Chhabria explained further that  “by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.” According to him, this would primarily affect not classic works or renowned authors but rather the market for the “typical human-created romance or spy novel,” which could be substantially diminished by similar AI-created works.  However, all these points were framed as “this Court’s general understanding of generative AI models and their capabilities”, with Judge Chhabria emphasizing that “Courts can’t decide cases based on general understandings. They must decide cases based on the evidence presented by the parties.”  Despite this general understanding that “copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create“, Judge Chhabria found two of the plaintiffs’ three market harm theories “clear losers,” and the third, a “potentially winning” argument, underdeveloped: “First, the plaintiff might claim that the model will regurgitate their works (or outputs that are substantially similar), thereby allowing users to access those works or substitutes for them for free via the model. Second, the plaintiff might point to the market for licensing their works for AI training and contend that unauthorized copying for training harms that market (or precludes the development of that market). Third, the plaintiff might argue that, even if the model can’t regurgitate their own works or generate substantially similar ones, it can generate works that are similar enough (in subject matter or genre) that they will compete with the originals and thereby indirectly substitute for them. In this case, the first two arguments fail. The third argument is far more promising, but the plaintiffs’ presentation is so weak that it does not move the needle, or even raise a dispute of fact sufficient to defeat summary judgment.“ In the overall analysis of the four factors, only the second factor weighed against Meta. Summary judgment was granted to Meta regarding the claim of copyright infringement from using plaintiffs’ books for AI training. Nevertheless, Judge Chhabria clarified that “this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.” The use of pirated copies was also addressed in Kadrey v. Meta. In this case, “there is no dispute that Meta torrented LibGen and Anna’s Archive […].” According to Judge Chhabria, while downloading from shadow libraries wouldn’t automatically win the plaintiffs’ case, it was relevant for the fair use analysis, especially regarding “bad faith” and whether the downloads benefited or perpetuated unlawful activities. Lessons

Artificial Intelligence, Blog

The Great Flip: Can Opt-Outs be a Permitted Exception? Part II

By Lokesh Vyas and Yogesh Badwal. This post was originally published on Spicy IP. In the previous part, we examined whether the opt-out mechanism, as claimed in Gen-AI litigations, constitutes a prohibited formality for the “enjoyment and exercise” of authors’ rights under Article 5(2) of the Berne Convention. And we argued no. In this post, we address the second question: Can opting out be permitted as an exception under the three-step test outlined in Article 9(2)? If you haven’t seen the previous post, some context is helpful. (Or, you can skip this part) As we mentioned in the last post, “Many generative AI models are trained on vast datasets (which can also be copyrighted works) scraped from the internet, often without the explicit consent of content creators, raising legal, ethical, and normative questions. To address this, some AI developers have created and claimed “opt-out mechanisms,” allowing copyright holders or creators to ask that their works not be used in training (e.g., OpenAI’s Policy FAQs).  Opt out under the Copyright Exception A  question arises here: What are the other ways opt-out mechanisms can be justified if the states want to make a mechanism like that? One may say that opt-outs can be valid under the Berne Convention if an exception (e.g., an AI training exception with an inbuilt opt-out possibility) passes the three-step test. And this way, opt-outs can be regarded as a legitimate limit on holders’ exclusive rights. For reference, the three-step test was created in the 1967 revision conference, later followed in Article 13 of TRIPS and Article 10 of WCT. The test creates a room for the nations to make certain exceptions and limitations. Article 9(2) authorises the member countries “to permit the reproduction” of copyright works in 1.) “certain special cases, provided that such reproduction 2.) does not conflict with a normal exploitation of the work and 3.) does not unreasonably prejudice the legitimate interests of the author”.  Although we don’t delve into the test, how opting out can be a part of an exception can be understood from an example. For instance, as Ginsburg exemplifies, if a country states that authors lose their translation rights unless they explicitly reserve or opt out of them, it would violate Article 5(2) because such rights under Berne must apply automatically, without formalities. This actually happened with Turkey in 1931, whose application for membership was rejected due to the condition of deposit for translation rights in its domestic law. (See Ricketson and Ginsburg’s commentary, paragraph 17.18.)  But if an exception (like allowing radio retransmissions in bars) already complies with Berne’s provisions and applies equally to all authors, then letting authors opt out of that exception would give them more rights than Berne requires. And this should be permissible.  Notably, introducing an exception, such as for AI training, must first pass the three-step test. Opt out can be built therein. However, remember that every exception presupposes a prima facie infringement. Within that frame, the opt-out offers the author a chance not to lose. Thus, it creates an inadvertent expansion of her rights beyond the convention.  Additionally, opt-out can fare well with the three-step test due to the factor of “equitable remuneration to authors.” As Gompel notes in his piece, “…‘opt out’ eases compliance with the three-step test because it mitigates some of the adverse effects of the proposed copyright exception. That is, it enables authors to retain exclusivity by opting out of the compensation scheme.”  Another question also exists: Did Berne contain particular provisions that directly allowed an opt-out arrangement? Well, the answer is Yes. Does opting out equal the right to reserve under Article 10bis? Not really. Setting aside the debate over formality and the three-step test, the Berne Convention contains an opt-out-style provision, albeit limited, where authors must explicitly reserve their rights to avoid specific uses of their work. Relevant here is Article 10bis of the Convention, which allows member countries to create exceptions for the reproduction of works published in newspapers on, among other topics, current economic, political, or religious issues. However, it also allows the authors to ‘expressly reserve’ their work from reproduction. Indian Copyright Act, 1957 also contains a similar provision in Section 52(1)(m). Interestingly, the right to reserve exploitation has been part of the Berne Convention since its earliest draft. It first appeared in Article 7 alongside the provision on formalities, which was numbered Article 2 in the draft. Article 7 became Article 9(2) in 1908, when formalities were prohibited and the no-formality rule entered the Berne Convention.  This historical pairing raises a strong presumption: opting out of a specific mode of exploitation cannot automatically be deemed a prohibited formality. Ginsburg confirms this, citing the 1908 Berlin Conference, which clarified that the reservation/opt-out clause (then Article 9(2)) was not considered a formality. But can this special setting (created in Article 10bis(1)) be used to open the door for general opt-out AI exception measures by countries? We doubt it. As the negotiation history of the 1967 revision conference suggests, Article 10bis(1) is a lex specialis, i.e., a narrow and specific exception (See page 1134 of Negotiations, Vol. II). This means that it may derogate from the general no-formalities rule, but it cannot serve as a model for broader declaratory measures.  Conclusion The upshot is that opt-outs may be de facto formalities. However, not all formalities are prohibited under the Berne Convention. The convention enables countries to make some formalities on “the extent of protection.” Three key points emerge from this discussion: One, opting out may not be a formality that prevents the enjoyment and exercise of rights, as Gompel and Sentfeln confirm, and Ginsburg argues otherwise. Two, it can be a part of an AI training exception if such an exception can pass the three-step test. When applying this test, opting out would support the factor of equitable remuneration. Three, Article 10(bis) on the right to reserve cannot be read expansively. While it can be used to justify the three-step test as Sentfleben does, it might not be extended generally. Okay. That’s it from our end. À bientôt’ Primary Sources:-

Scroll to Top