Text and Data Mining

Africa: Copyright & Public Interest, Artificial Intelligence, TDM Cases

Case Studies of AI for Good and AI for Development

Today the Geneva Centre on Knowledge Governance presents a series of Case Studies on AI for Good in Africa and the Global South. These grew out of our work on Text and Data Mining and our policy work in support of the Right to Research. Researchers in the Global South are responding to local and global challenges from health and education to language preservation and mitigation of climate change. In all these case computational methods and Artificial Intelligence (AI) play a leading role in finding and implementing solutions. A common thread that runs through all the cases is how intellectual property laws can support innovation and problem solving in the public interest, whilst protecting the interests of creators, communities and custodians of traditional knowledge. In addition several practitioners are looking at how to redress data imbalances, where large companies in the Global North have much greater access to works, for historical, legal and economic reasons. The cases include: Each of our case studies in written up in the form of a report, combined with a video exploration of the case study in the words of its leading practitioners.

Blog

Unfair Licensing Practices in the Library Sector

Teresa Nobre outlines a chilling range of practices by publishers to try to restrict the ability of researchers to conduct computational research. From ‘choice of law’ clauses which seek to circumvent EU law, to increased liability and penalties on libraries which fail to police their users. Nobre suggests a series of urgent measures to tip the balance back in favour of libraries and their users, and ultimately in favour of the right to research. This presentation was delivered at the User Rights meeting in Geneva on 17 June 2025. The full text is available below. The transition to licensing We have transitioned from a sales-based model in printed publications to a licence-based model in digital publications. What happens is that even if you have a fit-for-purpose framework that allows libraries to make certain uses of copyrighted works, they still need to rely on licences to have a first access to the material, and that gives publishers a lot of power in determining what libraries can and cannot do with the licensed materials, even if you have exceptions that allow them to make certain uses. Communia’s research We know that these licences tend to be subject to confidentiality agreements, which means that we don’t know what are the terms of these licences.  Communia is a non-profit based in Brussels, we have been involved in copyright reform for many years, we have been coming to the SCCR for many years, and we decided in February this year, we invited licensing managers, so people that are from the library sector, public library and academic library sector in Europe, we invited them to come to Brussels and we held a Chatham House rules meeting. We also invited the European Commission to attend this meeting and observe this meeting. And this environment where people could not attribute each other was the right environment for licensing managers to come and talk about the issues that they are facing with the licences, so the unfair licensing practices, the unfair terms that they are being subject to. So I will be mentioning some of those practices, and I will start with a very hot topic right now, which is the topic of AI, but also text and data mining for scientific research. Maybe I should also tell you that in addition to inviting librarians to come and talk to us in private, in front of the Commission, we also invited them to share with us in confidence clauses that they considered unfair, clauses that are part of those licensing agreements or licensing offers. Efforts to Circumvent the European TDM Directive Maybe here for those that are not European, I should give you a bit of a legal context of Europe. In Europe, six years ago we passed a new directive that guarantees that researchers in Europe can make text and data mining uses of copyrighted materials for scientific research. So we have a mandatory exception for these research uses. And this mandatory exception is protected against contractual overrides. And what does that mean? It means that if a licence says that you cannot make those uses, you don’t need to follow the licence because the law, the European law, protects you.  And what we realised, and we were very surprised, that publishers were actually concerned about prohibiting these uses in Europe when we have a law that allows these uses and prohibits contractual overrides. But that was indeed the case. So we noticed, and they told us, that since 2023, so place it at the same time where generative AI is raising, suddenly all the contracts are saying library users cannot conduct text and data mining on e-books and e-journals that are available in the libraries.  They cannot conduct any related AI uses with those materials.  ‘Choice of Law’ clauses And surprisingly, what was interesting to see was that, well, they were actually concerned about putting those prohibitions in those contracts, although the law would not allow for those prohibitions, because they could circumvent the EU policy, the EU law, and our contractual overrides prohibition by selecting a law that’s outside of Europe. So we know that ‘choice of law’ is typically a clause that the parties need to negotiate and takes time to negotiate. Everyone wants to choose their own law. But in this case, by choosing a law that’s not the national law where the library is located, meaning that’s not the EU law which would protect these uses against contractual overrides, they are able to circumvent basically the EU law and the prohibition of contractual overrides. And that’s enough. So imagine all of the work that we have done throughout the years to have exceptions in place, exceptions that are protected against contractual overrides, is simply circumvented by a choice of law clause. I’m going to give you an example of what prohibition of AI uses in these licences means. And, you know, there’s different ones. And you can see in our report, we gave some examples of it. Prohibition of AI-enabled browsers But publishers go as far as prohibiting the use of browsers with connected AI functionality. People, nowadays, there’s no browsers that do not use AI.  And publishers are prohibiting the library users from using browsers with AI functionality. This is how far it goes. We saw different variations of this. For instance, you see one that’s very simple, straightforward. You cannot conduct text and data mining, which is exactly what the EU law allows you to do. And when it comes to the choice of law, I think typically what we are seeing is that they are choosing U.S. law, maybe because the U.S. law right now, it’s not very clear if it allows these sort of uses or not. If it’s a UK publisher, they will select the U.K. law, which also doesn’t permit as many text and data mining uses as the EU law. So this is the first, let me say, the first category of obstacles and really

Artificial Intelligence, Blog

The Great Flip: Can Opt-Outs be a Permitted Exception? Part II

By Lokesh Vyas and Yogesh Badwal. This post was originally published on Spicy IP. In the previous part, we examined whether the opt-out mechanism, as claimed in Gen-AI litigations, constitutes a prohibited formality for the “enjoyment and exercise” of authors’ rights under Article 5(2) of the Berne Convention. And we argued no. In this post, we address the second question: Can opting out be permitted as an exception under the three-step test outlined in Article 9(2)? If you haven’t seen the previous post, some context is helpful. (Or, you can skip this part) As we mentioned in the last post, “Many generative AI models are trained on vast datasets (which can also be copyrighted works) scraped from the internet, often without the explicit consent of content creators, raising legal, ethical, and normative questions. To address this, some AI developers have created and claimed “opt-out mechanisms,” allowing copyright holders or creators to ask that their works not be used in training (e.g., OpenAI’s Policy FAQs).  Opt out under the Copyright Exception A  question arises here: What are the other ways opt-out mechanisms can be justified if the states want to make a mechanism like that? One may say that opt-outs can be valid under the Berne Convention if an exception (e.g., an AI training exception with an inbuilt opt-out possibility) passes the three-step test. And this way, opt-outs can be regarded as a legitimate limit on holders’ exclusive rights. For reference, the three-step test was created in the 1967 revision conference, later followed in Article 13 of TRIPS and Article 10 of WCT. The test creates a room for the nations to make certain exceptions and limitations. Article 9(2) authorises the member countries “to permit the reproduction” of copyright works in 1.) “certain special cases, provided that such reproduction 2.) does not conflict with a normal exploitation of the work and 3.) does not unreasonably prejudice the legitimate interests of the author”.  Although we don’t delve into the test, how opting out can be a part of an exception can be understood from an example. For instance, as Ginsburg exemplifies, if a country states that authors lose their translation rights unless they explicitly reserve or opt out of them, it would violate Article 5(2) because such rights under Berne must apply automatically, without formalities. This actually happened with Turkey in 1931, whose application for membership was rejected due to the condition of deposit for translation rights in its domestic law. (See Ricketson and Ginsburg’s commentary, paragraph 17.18.)  But if an exception (like allowing radio retransmissions in bars) already complies with Berne’s provisions and applies equally to all authors, then letting authors opt out of that exception would give them more rights than Berne requires. And this should be permissible.  Notably, introducing an exception, such as for AI training, must first pass the three-step test. Opt out can be built therein. However, remember that every exception presupposes a prima facie infringement. Within that frame, the opt-out offers the author a chance not to lose. Thus, it creates an inadvertent expansion of her rights beyond the convention.  Additionally, opt-out can fare well with the three-step test due to the factor of “equitable remuneration to authors.” As Gompel notes in his piece, “…‘opt out’ eases compliance with the three-step test because it mitigates some of the adverse effects of the proposed copyright exception. That is, it enables authors to retain exclusivity by opting out of the compensation scheme.”  Another question also exists: Did Berne contain particular provisions that directly allowed an opt-out arrangement? Well, the answer is Yes. Does opting out equal the right to reserve under Article 10bis? Not really. Setting aside the debate over formality and the three-step test, the Berne Convention contains an opt-out-style provision, albeit limited, where authors must explicitly reserve their rights to avoid specific uses of their work. Relevant here is Article 10bis of the Convention, which allows member countries to create exceptions for the reproduction of works published in newspapers on, among other topics, current economic, political, or religious issues. However, it also allows the authors to ‘expressly reserve’ their work from reproduction. Indian Copyright Act, 1957 also contains a similar provision in Section 52(1)(m). Interestingly, the right to reserve exploitation has been part of the Berne Convention since its earliest draft. It first appeared in Article 7 alongside the provision on formalities, which was numbered Article 2 in the draft. Article 7 became Article 9(2) in 1908, when formalities were prohibited and the no-formality rule entered the Berne Convention.  This historical pairing raises a strong presumption: opting out of a specific mode of exploitation cannot automatically be deemed a prohibited formality. Ginsburg confirms this, citing the 1908 Berlin Conference, which clarified that the reservation/opt-out clause (then Article 9(2)) was not considered a formality. But can this special setting (created in Article 10bis(1)) be used to open the door for general opt-out AI exception measures by countries? We doubt it. As the negotiation history of the 1967 revision conference suggests, Article 10bis(1) is a lex specialis, i.e., a narrow and specific exception (See page 1134 of Negotiations, Vol. II). This means that it may derogate from the general no-formalities rule, but it cannot serve as a model for broader declaratory measures.  Conclusion The upshot is that opt-outs may be de facto formalities. However, not all formalities are prohibited under the Berne Convention. The convention enables countries to make some formalities on “the extent of protection.” Three key points emerge from this discussion: One, opting out may not be a formality that prevents the enjoyment and exercise of rights, as Gompel and Sentfeln confirm, and Ginsburg argues otherwise. Two, it can be a part of an AI training exception if such an exception can pass the three-step test. When applying this test, opting out would support the factor of equitable remuneration. Three, Article 10(bis) on the right to reserve cannot be read expansively. While it can be used to justify the three-step test as Sentfleben does, it might not be extended generally. Okay. That’s it from our end. À bientôt’ Primary Sources:-

Artificial Intelligence, Blog

The Great Flip: Is Opt Out a Prohibited Formality under the Berne Convention? Part I

By Lokesh Vyas and Yogesh Badwal. This post was originally published on Spicy IP. Bonjour, Lately, we’ve been cogitating on this curious concept called the “opt-out”, which has been cropping up with increasing frequency in generative AI litigation, including in India. The EU and the UK are taking the idea seriously and considering giving it statutory teeth. On the surface, it is sold as a middle path, a small price to pay for “balance” in the system. However, at least prima facie, it seems like a legal absurdity that fractures its modern foundational logic, where authors receive default copyright without any conditions. The opt-out model, the argument goes, reintroduces formality through the back door, a de facto formality of sorts. This shifts the burden onto authors and rights holders to actively monitor or manage their works to avoid unintended inclusion in the AI training. There have been questions about whether such an opt-out scheme is compatible with the Berne Convention, which prohibits the same under Article 5(2), e.g., here, here, and here.  Given the complex nature of this issue and the fact that many such discussions happen behind paywalls, making them inaccessible to the public, we thought it would be beneficial to share our ideas on this topic and invite further reflection. This two-part post mainly focuses on the legality of opting out without addressing its implementability and applicability, which raises several questions (e.g., as discussed recently in Martin Sentfleben’s post). In short, we probe whether opt-outs violate the Berne Convention—the first international copyright law treaty binding on all members of the TRIPS and WCT.  We answer it through two questions and discuss each one separately. First, is opt-out a prohibited formality for the “enjoyment and exercise” of authors’ rights under Article 5(2) of the Berne Convention? Two, can it be permitted as an exception under the three-step test under Article 9(2)? We answer the first question in the negative and the second in the positive. Additionally, we also examine whether Berne already has a provision that can allow this without looking at the details.  This post addresses the first question. What Makes Opts Outs So Amusing – The Flip? Many generative AI models are trained on vast datasets, which can also include copyrighted works scraped from the internet without the explicit consent of content creators, raising legal, ethical, and normative concerns. To address this, some AI developers have created and claimed “opt-out mechanisms,” allowing copyright holders or creators to ask that their works not be used in training (e.g., OpenAI’s Policy FAQs).  Herein lies the catch: it requires authors and copyright holders to explicitly say “No” to training by adding a robots.txt tag to their website with specific directives that disallow web crawlers from accessing their content. (E.g., see this OpenFuture’s guide here) Thus, instead of creators being protected by default, they are supposed to opt out to prevent exploitation. One could say that this flips the logic of copyright on its head–from a presumption of protection to a presumption of permission. But that’s not so simple.  Notably, opting out is not a novel argument. In fact, it can be dated back at least to the 1960s in the Nordic countries’ model of “Extended Collective Licensing” (ECL), which mandates collective licensing while preserving the author’s right to opt out. Other notable academic literature on opt-out can be found here, here, here, and here, dating back over two decades. Swaraj also covered this issue a decade ago. In particular, we must acknowledge the scholarship of Jane Ginsburg, Martin Sentfleben, and Stef van Gompel, who have significantly influenced our thinking on the topic. Two Key Questions: Opt out as a Formality and opt out under a permitted Exception Formality Argument first.  Here, the argument goes that the opt-out is a prohibited formality under Article 5(2) and should not be allowed. However, we doubt it. Let’s parse the provision first. Which states: “(2) The enjoyment and the exercise of these rights shall not be subject to any formality; such enjoyment and such exercise shall be independent of the existence of protection in the country of origin of the work. Consequently, apart from the provisions of this Convention, the extent of protection, as well as the means of redress afforded to the author to protect his rights, shall be governed exclusively by the laws of the country where protection is claimed.” (Authors’ emphasis) For context, the provision pertains to “Rights Guaranteed outside the Country of Origin” for both national and foreign authors. And the question of no-formality pertains particularly to foreign authors. In other words, by removing formality requirements in the country where protection is claimed, the provision enabled authors to automatically receive protection without needing to satisfy foreign formalities. This matters because while countries can impose conditions on their own nationals, it’s generally assumed that they will not treat their own authors worse than foreign ones. The post follows this presumption: if a country cannot burden foreign authors, it’s unlikely to impose stricter terms on its own people. Although the removal of formalities had been discussed in the international copyright law context as early as the 1858 Brussels Conference, an important event in the development of international copyright law, it was not implemented until 1908. This change addressed practical difficulties, including identifying the “country of origin” when a work was published in multiple countries, and the challenges courts faced in enforcing rights without formalities. (See International Bureau’s Monthly Magazine, January 1910) Tellingly, while a country can make formalities for its people, it cannot do so for foreign authors. It’s generally assumed that a country would not obligate its authors more than it does to foreign authors. Textual Tensions of Article 5(2) While the phrase “any formality” in the first line of the provision might suggest that all kinds of formalities—including de facto ones like opt-out mechanisms—are prohibited, that is arguably not the case. We say this because the provision is divided into two parts, and the prohibition on formalities applies only to the first part, which is germane to enjoying and exercising rights. The second part of the provision, beginning with “Consequently”, gives leeway to the states wherein they can make formalities regarding the ‘extent of protection’

Scroll to Top