Mauritz Kop Presents Machine Learning & EU Data Sharing Practices at IPSC 2020, Stanford Law School

By AIRecht Editor

Stanford, August 5, 2020. Works-in-progress scholarship lives or dies by peer critique. The Intellectual Property Scholars Conference (IPSC) — co-sponsored by Berkeley Law, Cardozo Law, DePaul Law and the Stanford Program in Law, Science & Technology — is where IP scholars test new work before journals see it. At IPSC 2020, hosted by Stanford Law School, Mauritz Kop presented his paper Machine Learning & EU Data Sharing Practices, the data-governance pillar of his AI-and-IP research line. The paper's argument and its publication history are set out in Machine Learning & EU Data Sharing Practices.

The 2020 edition was unlike any IPSC before it. COVID-19 moved the conference online, and Stanford convened it as a series of virtual panels running from July 15 through August 5, 2020 — stretching what is normally a two-day marathon into a three-week season of sessions, documented on the IPSC 2020 conference site.

Machine learning and EU data sharing practices — the theme of the IPSC 2020 presentation.

The paper: who may train on what, and on which terms

Machine learning is hungry: model quality tracks training-data quality, and training data in Europe sits under several overlapping legal regimes at once. Kop's paper mapped that intersection — copyright and database rights in training corpora, the text-and-data-mining (TDM) exceptions of the EU's Digital Single Market directive, trade secrets, and the GDPR's grip on personal data used for model development — and asked the practical question behind the doctrinal one: which data-sharing arrangements actually let European AI development proceed lawfully?

The argument ran toward modernization: where exclusive rights and data-protection rules overlap without coordination, they tax exactly the data flows the EU's own AI strategy depends on. The paper is preserved in the permanent Stanford Center for Responsible Quantum Technology collection at the Stanford Law Library — the stable archival copy resolves at purl.stanford.edu/zs304tj5371. Its companion piece on the data-protection side, published in the Harvard Journal of Law & Technology's digest, is discussed in The Right to Process Data for Machine Learning Purposes in the EU.

The friction points, concretely

Two frictions carried the paper's weight. First, the DSM directive's text-and-data-mining exceptions allow rightsholders to opt out of commercial TDM — which means the lawfulness of a European training corpus can depend on a patchwork of machine-readable reservations that no developer can fully audit. Second, the GDPR attaches to personal data wherever it travels in the pipeline, so a model trained on inadequately anonymized records inherits a compliance defect that no later processing step cures. Each friction is manageable alone; stacked, they push AI development toward jurisdictions with cleaner data rules — the policy problem the paper asked Europe to confront on its own terms, within its own data strategy.

The TDM architecture, and where it went

The directive itself — Directive (EU) 2019/790 on copyright in the Digital Single Market — draws the line the paper probed. Its Article 3 gives research organisations and cultural-heritage institutions a mandatory text-and-data-mining exception for scientific research that rightsholders cannot contract away; its Article 4 extends TDM to everyone else, but only insofar as rightsholders have not expressly reserved their rights, in machine-readable form for online content. In 2020 that opt-out mechanism was an untested novelty; it has since become one of the most consequential provisions in European AI law, because the EU's AI framework now expects general-purpose model providers to put in place policies respecting exactly those reservations. The question the IPSC paper asked — whether a training corpus assembled under a patchwork of reservations can ever be audited with confidence — moved from conference-room hypothetical to compliance-desk reality in under five years.

A virtual panel among the field's leading scholars

The IPSC format — short presentations, dense Q&A, no published proceedings — exists purely to make drafts better before publication. For interdisciplinary work that spans artificial intelligence, data governance and IP doctrine, that critique is indispensable: doctrinalists, economists and technologists each probe a different weak point. Presenting EU data-sharing scholarship to a predominantly American IP audience also sharpens the comparative edge — the questions assume a different copyright baseline, a different fair-use culture and a different privacy regime, and the paper must hold up under all of them.

The Stanford connection was no coincidence. Kop had joined the Stanford-Vienna Transatlantic Technology Law Forum as a TTLF Fellow earlier that year — see Mauritz Kop becomes TTLF Fellow at Stanford University — and the machine-learning data research formed part of his TTLF agenda on transatlantic technology regulation.

The virtual format changed the texture of the critique as well. A three-week panel season gives discussants time to read drafts between sessions rather than skim them on a conference morning, and it lowers the cost of attending a session outside one's own subfield — both of which favor exactly the kind of interdisciplinary work a data-governance paper is. Works-in-progress scholarship, it turned out, survives a pandemic rather well. Several of the questions raised in those sessions left visible traces in the published versions of the work — which is the only metric a workshop format ever needs.

Where the research line went next

The data-governance work presented at IPSC 2020 fed directly into the broader arc of Kop's scholarship: the articulated-public-domain thesis for AI output, analyzed in AI & Intellectual Property: Towards an Articulated Public Domain, and from 2020 onward the quantum technology line — quantum computing's IP architecture — that would return to IPSC in the two following years. The wider body of work now sits in the permanent Stanford RQT collection at the Stanford Law Library.

Why conference workshopping matters

A paper that survives an IPSC room is a different paper afterward: assumptions named, counterexamples absorbed, the comparative blind spots lit. For a European scholar writing about EU data law for a global artificial intelligence debate, the 2020 virtual edition offered something the pre-pandemic format never quite had — three weeks of panels, and an audience no longer limited to whoever could fly to California. The training-data questions the paper posed in 2020 have since moved to the center of AI regulation on both sides of the Atlantic; the workshop draft critiqued that summer became part of the foundation those later debates built on.

For readers tracing the arc, IPSC 2020 is also the first station of a three-year sequence: the machine-learning data paper at Stanford's virtual edition, a quantum IP architecture at Cardozo in 2021, and a market-power analysis back at Stanford in 2022. The data-governance questions came first for a reason — before any debate about who owns AI's outputs or who controls quantum's patents, there is the prior question of who may lawfully feed the machines. That question has not aged; it has only acquired enforcement deadlines.

Last updated: June 9, 2026

Intellectual Property, Academics, ConferenceHenry Quentir5 augustus 2020IPSC, Stanford Law School, machine learning, intellectual property, data sharing, TDM

Blog over Kunstmatige Intelligentie, Quantum, Deep Learning, Blockchain en Big Data Law