data scraping – Centre on Knowledge Governance

Ethical Data Scraping for Research – Expert Workshop held in Amsterdam

InfoJustice Eds. / 25 July, 2025

A unique, expert-led workshop on ethical data scraping was organized by Professor Niva Elkin-Koren and Dr. Maayan Perel and hosted by the Shamgar Center of Digital Law and Innovation, Tel Aviv University. The workshop was made possible by the generous support of the Right to Research in International Copyright Law coalition at the American University, especially Professor Sean Flynn, the Director of the Program on Information Justice and Intellectual Property (PIJIP). An interdisciplinary group of information law experts gathered in Amsterdam’s beautiful Volks hotel on July 2, 2025, to discuss data scraping for research and innovation and its ethical boundaries. The event aligned with the agenda of the Standing Committee on Copyright and Related Rights (SCCR), which promotes public interest strategies, coordinated action, and research, and seeks to inform public policy on legal exceptions and limitations for researchers. Data scraping is an essential research tool for academics and scientists across a wide range of disciplines. It is also critical for training artificial intelligence (AI) models and developing innovative research methodologies. The legal boundaries of data scraping attract considerable attention, not only from academics but also from policymakers, governments, courts, technology companies, and data providers worldwide. The boundaries of ethical data scraping— often dependent on the type of data being scraped, the technologies being used, the purpose of scraping, and the applicable legal framework—remain unclear. Consequently, researchers are left to navigate the potential legal risks and changing technological barriers set by tech giants, such as Cloudflare (recently adopting a permission-based approach to data scraping). As a result, researchers may be deterred from engaging in lawful data scraping, at the cost of not engaging in research that can serve the public interest. Moderated by Dr. Maayan Perel and Professor Eldar Haber, the workshop aimed to bring greater clarity to what ethical data scraping is and should be. The workshop applied practical and technical insights from real-world data scraping, analyzed the legal implications of various transatlantic approaches, and proposed guidelines for promoting ethical data scraping for research and development. To obtain a better understanding of how data scraping models work in practice, participants explored a test case model from Bright Data, an international data scraping company, whose model was also discussed in recent litigation with X and Meta. In a stimulating presentation, Bright Data representatives described their publicly available data scraping technology, elaborated on their ethical policies, and presented their “data for good” initiative, which offers scraping opportunities for researchers as well as other stakeholders. To encourage a productive dialogue between academic and business participants, the discussion followed a “red teaming” approach. Red teaming, a concept we adapted from the cybersecurity realm, essentially aims to help organizations proactively identify weaknesses and strengthen their security posture before actual attacks occur. Applying red-teaming’s critical approach, the participants identified potential legal challenges in Bright Data’s data test case model from various perspectives, including intellectual property law, competition law, privacy law, and data protection law, while also identifying points of legal tension between the US and the EU frameworks. The issues highlighted included the legal application of copyright law to information copying and storage; questions of competition law arising from the dominant market actors’ ability to adjust behavior and match prices; and the scope of privacy protection in personal information that data providers voluntarily make publicly accessible. Next, insights from Bright Data’s test case were used to draw broader observations about what constitutes ethical data scraping in practice, especially for AI training. Key issues included: The workshop concluded with a broader discussion of potential legal, technical, and institutional strategies to promote ethical data scraping for academic research and technological development. Participants identified the need to distinguish between questions of access to data and questions of the use of the data, as each raises different legal issues. Key suggestions included: Participants: Tanya Aplin, Mor Avisar, Balazs Bodo, Sharon Bar Ziv, Sean Flynn, Eldar Haber, Uri Hacohen, Bernt Hugenholtz, Aline Iramina, Matthias Leistner, Dana Mazia, Maayan Perel, Mando Rachovista, Pamela Samuelson, Martin Senftleben, Ben Sobel, Streffan Verhultz, Amit Zac