The Digital Culture and Communication section of ECREA
Dan L Burk, Chancellor’s Professor of Law, University of California, Irvine.
As in many areas of modern life, legal governance and regulation is becoming increasingly reliant on the availability of ubiquitous data collection and algorithmic data processing. Applications of these technologies have emerged in numerous areas including criminal law, immigration, taxation, and contract. In the area of copyright, protection of digitized works is increasingly mediated by algorithmic enforcement systems that are intended to effectuate the rights of copyright owners, while simultaneously limiting the liability of content intermediaries. On YouTube, Google, Facebook, and many other on-line platforms, both ISPs and copyright owners have deployed detection and removal algorithms that are intended to purge illicit content from their sites.
But unauthorized content is not necessarily illicit content. Copyright allows authors to restrict reproduction, performance, and related uses of their original works as a pecuniary incentive. But copyright, like any property right, is never absolute. Copyright systems typically include some number of user privileges or exemptions, under which the statute will condone particular uses of a copyrighted work even if the copyright owner has not done so. These vary between jurisdictions, but typically cluster around socially beneficial uses of the work such as education, news reporting, scholarship, personal enrichment, or public commentary. Often known in British Commonwealth countries as “fair dealing” provisions, these exceptions to the authorization of the copyright holder entail a specific laundry-list of discrete, statutorily defined circumstances under which a protected work can be used without permission.
In the United States, the Copyright Act also includes a number of such discrete statutory carve-outs. Additionally, the United States, together with a small handful of other nations, includes in its copyright limitations a flexible exception known as “fair use.” Fair use is not categorically or specifically defined, but is rather decided based upon judicial assessment of four factors. Roughly speaking, a court determining whether an otherwise infringing use might be fair is to consider how much of the work was taken, what was done with it, what kind of work was subjected to the taking, and what effect the taking likely had on the market for the work. Determination as to whether unauthorized use of a work falls under this provision varies from situation to situation, depending on the contextual assessment of the four factors.
Many unauthorized digital postings might claim legal legitimacy under fair use or other exceptions to the rights of the copyright holder, even though the current algorithmic enforcement systems do not take such exceptions into account. Exceptions such as fair use exist to ameliorate the negative effects of exclusive control over expression on public discourse, personal enrichment, and artistic creativity. Consequently, it might be desirable to incorporate context specific fair use metrics into copyright policing algorithms, both to protect against automated over-deterrence, and to inform users of their compliance with copyright law.
But incorporating contextual judgments such as the fair used standard into an algorithm is problematic; not only due to the challenge of defining the parameters and characteristics of legal texts, but also the inherent limitations of computer languages, their operating environments, and the capabilities of the hardware available to execute coded instructions. Current machine learning techniques attempt to sidestep such difficulties by creating routines that recognize data patterns, and allowing the routine to operate according to the values in the pattern found, rather than attempting to specify values in advance. This raises the possibility that algorithmic fair use parameters might not have to be explicitly defined and coded.
Empirical investigation of the corpus of fair use decisions from American courts suggests that fair use outcomes are neither random nor unpredictable, but may follow particular patterns of judicial decision-making. One can imagine that a machine learning system could detect these or other patterns in the data surrounding past cases, matching them to similar patterns in the data surrounding future fair use incidents, situations, and scenarios without formal programming definition of the fair use factors. Such a system might provide the kind of fair use assessments prior to the actual use, or in conjunction with on-line copyright enforcement decisions.
But machine learning approaches pose their own problems. It is cogently said that raw data is an oxymoron, and choices must be made regarding how to select training and decisional data for any algorithm. Where the four factors of the fair use standard are concerned, many of the points where such choices must be made quickly become obvious. For example, determining the impact of the unauthorized use of a work on the market for the underlying work requires a model of the market and decisions about the data that properly populate that model. Fair use assessment of the type of work used and the use to which the protected content is put require some categorization of works and uses. These and a multitude of other choices are would determine the allowance or disallowance of uses for protected content. Algorithms do not make judgments; they are at best the tools of human judgment.
We would therefore need to ask how such any such fair use calculator came into existence. Designing, maintaining, repairing, gathering, curating, updating and its attendant databases are not costless activities; they are to the contrary likely to be expensive. What entities have the motivation and the resources to construct such a system? The law might coerce copyright industries, such as the movie or music industries, into incorporating fair use assessment into their policing efforts. Alternatively, one could imagine service providers such as Google or Facebook deploying algorithmic fair use to justify their decisions to remove or allow content on their platforms. Far less likely is any scenario where the consumers of copyrighted content deploy a fair use algorithm, or even where fair users would have any hand in crafting the systems that assess the applicability of the exemption to their activities.
But while we should be deeply concerned with the inevitable biases in that attend algorithmic design and implementation, a larger concern is the recursive algorithmic entanglements that change public practice and so change social meaning. This type of effect is already seen in the algorithmic copyright policing of on-line content, where the algorithmic removal action has become a de facto finding of infringement, where the public has begun to internalize such outcomes, and where formal copyright law may be incorporating those expectations into its weft. Whatever form algorithmic fair use might take would likely become a similar social and legal default.
In short, implementation of algorithmic fair use would inevitably change the nature of fair use. Whatever choices or biases, inclusions or exclusions, expectations or oversights were engineered into the algorithm would become a self-fulfilling prophecy as to the nature of fair use. Algorithmic fair use carries with it the very real possibility of habituating new media participants to its own biases, and so progressively altering the fair use standard it attempts to embody. At a minimum, reliance on algorithmic governance devolves regulatory design from the hands of publicly accountable officials to those of largely unaccountable engineers. Careful consideration of these and related effects are necessarily part any realistic assessment of algorithmic fair use, or indeed of any movement toward automated governance.