Promoting AI for Good in the Global South – Highlights

by Ben Cashdan

Across Africa and Latin America, researchers are using Artificial Intelligence to solve pressing problems: from addressing health challenges and increasing access to information for underserved communities, to preserving languages and culture. This wave of “AI for Good” in the Global South faces a major difficulty: how to access good quality training data, which is scarce in the region and often subject to copyright restrictions.

The most prominent AI companies are in the Global North and increasingly in China. These companies generally operate in jurisdictions with more permissive copyright exceptions, which enable Text and Data Mining (TDM), often the first step in training AI language models. The scale of data extraction and exploitation by a handful of AI mega-corporations has raised two pressing concerns: What about researchers and developers in the Global South and what about the creators and communities whose data is being used to train the AI models?

Ethical AI: An Opportunity for the Global South?

At a side event in April at WIPO, we showcased some models of ‘ethical AI’ aimed at:

  • Levelling the playing field: i.e. Enabling researchers and developers in the Global South to create AI tools in the public interest, without opening up all their training data to ‘extraction’ by for-profit corporations elsewhere in the world.
  • Creating a flow of benefits: i.e. Searching for ways of ensuring that creators and communities who provide language materials or creative works to train AI are also able to derive some benefit from the use of their work.

The event took place in Geneva in April 2025. This week we released a 15 minute highlights video.

Training data and copyright issues

At the start of the event, we cited two Text and Data Mining projects in Africa which have had difficulty in accessing training data due to copyright. The first was the Masakhane Project in Kenya, which used translations of the bible to develop Natural Language Processing tools in African languages. The second was the Data Sciences for Social Impact group at the University of Pretoria in South Africa who want to develop a health chatbot using broadcast TV shows as the training data.

Data Farming, The NOODL license, Copyright Reform

The following speakers then presented cutting edge work on how to solve copyright and other legal and ethical challenges facing public interest AI in Africa:

  • Chebet Koros of the Centre for Intellectual Property and Information Technology Law (CIPIT) at Strathmore University in Nairobi, Kenya talked about the NOODL license which aims to redress the imbalance between African developers and the rest of the world.
  • Professor Gloria Emezue of the Federal University Ndufu Alike Ikwo, Ebonyi State, Nigeria, talked about an innovative approach to language data generation in Africa, known as ‘Data Farming‘, in which African researchers go out and make recordings in communities. This data can be used for Text and Data mining applications or training AI for Good tools in Africa, and a flow of benefits is also established back to the communities. Data Farming contrasts with Data mining, which gives the sense of extraction and exploitation.
  • Dr Alexandra Garcia, a researcher  at the National Centre for Artificial Intelligence (CENIA) in Chile, outlined the establishment of LATAM-GPT, a Large Language Model based on ethical sourcing of data in Latin America, aimed at producing culturally-sensitive AI tools.
  • Professor Beatriz Busaniche, chair of Via Libre Foundation in Argentina spoke about creating alternatives to ‘Predatory AI’ in which local researchers in the Global South use ethically-sourced data to create culturally-sensitive AI tools and platforms. Professor Busaniche called on policy-makers to ensure that Copyright Limitations and Exceptions are expanded in Latin America, Africa and across the Global South, to allow developers and researchers in these regions to have the same access to training data as their counterparts in the US and elsewhere.

The AI Act in Brazil: Remunerating Creators

Carolina Miranda of the Ministry of Culture in Brazil indicated that her government is focused on passing a new law to ensure that those creators in Brazil whose work is used to train AI models are properly remunerated. Ms Miranda described how Big Tech in the Global North fails to properly pay creators in Brazil and elsewhere for the exploitation of their work. She confirmed that discussions of the AI Act are still ongoing and that non profit scientific research will be exempt from the remuneration provision.

Jamie Love of Knowledge Ecology International suggested that to avoid the tendency of data providers to build a moat around their datasets, a useful model is the Common European Data Spaces being established by the European Commission.

Four factors to Evaluate AI for Good

At the end of the event we put forward the following four discriminating factors which might be used to evaluate to what extent copyright exceptions and limitations should allow developers and researchers to use training data in their applications:

  • The nature of the works being used to train the AI. At one end of the spectrum is scientific research , especially research funded by the government, in which case such works should presumably be available for other researchers to analyse as part of future research activities. At the other end of the spectrum are cultural works such as songs or movies.
  • The type of entity conducting the research or developing or using the AI: This might range from non-profit research institutions such as universities to for-profit AI companies or commercial creators.
  • The purpose of the activity: One the one hand this might be public interest research outputs such as health or climate change research. On the other hand it might be the generation of cultural products (music, images etc) to compete in the market with the original training data.
  • The type of right: Researchers and developers might be granted unconditional rights to use the training data, or such use might require permission or remuneration of the original creators.

The panel was convened by the Via Libre Foundation in Argentina and ReCreate South Africa with support from the Program on Information Justice and Intellectual Property (PIJIP) at American University, and support from the Arcadia Fund. We are currently researching case studies on Text and Data Mining (TDM) and AI for Good in Africa and the Global South.


Ben Cashdan is an economist and TV producer in Johannesburg and the Executive Director of Black Stripe Foundation. He also co-founded ReCreate South Africa.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top