Artificial Intelligence and Copyright: The Canadian Cultural Community Speaks Out

Image: http://www.shutterstock.com

The extended period set by the Canadian Government (through Innovation, Science and Economic Development Canada, ISED) for response to its consultation paper on Artificial Intelligence (AI) and Copyright closed on January 15. We will start to see a flurry of submissions released by participants while ISED digests and assesses the input it has received. One of the first is the submission from the Coalition for the Diversity of Cultural Expression (CDCE), which represents over 360,000 creators and nearly 3,000 cultural businesses in both French and English-speaking parts of Canada. CDCE’s membership includes organizations representing authors, film producers, actors, musicians, publishers, songwriters, screenwriters, artists, directors, poets, music publishers—just about every profession you can think of that depends on creativity, and protection for creative output. The CDCE submission highlights three key recommendations, summarized as follows;

  • No weakening of copyright protection for works currently protected (i.e. no exception for text and data mining to use copyrighted works without authorization to train AI systems)
  • Copyright must continue to protect only works created by humans (AI generated works should not qualify)
  • AI developers should be required to be transparent and disclose what works have been ingested as part of the training process (transparency and disclosure).

While none of these recommendations are surprising, and from my perspective are eminently reasonable, I am sure we will also see a number of submissions arguing that, “in the interests of innovation”, access to copyrighted works is not only essential but should be freely available without permission or payment. OpenAI, the motive force behind ChatGPT—and the defendant in the most recent high-profile copyright infringement case involving AI (When Giants Wrestle, the Earth Moves (NYT v OpenAI/Microsoft)—has already staked out part of this position. In its brief to the UK House of Lords Select Committee looking into Large Language Models (LLMs), a key technology that drives AI development, the company says;

“Because copyright today covers virtually every sort of human expression–including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials (emphasis added). Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

OpenAI claims that it respects content creators and owners and looks forward to continuing to work with them, citing among other things, the licensing agreement for content it has signed with the Associated Press. But failure to reach a licensing deal with the New York Times is really the crux of the lawsuit that the media giant has brought against OpenAI and its key investor Microsoft. If reports are true that OpenAI’s licensing deals top out at $5 million annually, it is not surprising that licensing negotiations between the Times and OpenAI broke down over such lowball offerings.

As for the CDCE submission to ISED, it recommends that the government refrain from creating any new exceptions for text and data mining (TDM) since this would interfere with the ability of users and rightsholders to set the boundaries of the emerging market in licensing. No copyright exemption for AI is what the British government has just confirmed, after playing footsie with the concept for over a year. Apart from the examples of the licensing deals that OpenAI has with the Associated Press and German multimedia giant Axel Springer, the CDCE paper notes a range of other recent examples of content owners offering access to their product through licensing arrangements, including Getty Images, Universal Music Group and educational and scientific publishers like Elsevier. The paper also urges the government to avoid interfering in the market when it comes to setting appropriate compensation, leaving it to market players or, where the players can’t reach agreement, to the quasi-judicial Copyright Board.

In my view, licensing is going to be the solution that will eventually level the playing field, but to get there it will require that major content players lockout the AI web-crawlers while pursuing legal redress, as the NYT is doing. This will help to open the licensing path to smaller players and individual creators who don’t have the resources available to employ either technical or legal remedies. (The issue of what has already been ingested without authorization still needs to be settled). As for the tech industry’s suggestion that creators can opt-out of content ingestion if they wish, CDCE rightly points out that this is standing the world on its head, and would be contrary to longstanding copyright practice. Not only is it impractical in a world where what goes into an AI model is a black box (thus the imperative for transparency) but it is like saying a homeowner has to request not to be burgled, or else can expect to become a target.

On the question of whether AI generated works should be granted copyright protection, CDCE points out the double-standard of proposing an exception to copyright for TDM for inputs while claiming copyright protection for AI generated outputs. The need for human creativity is a line that has been firmly held by the US Copyright Office, pushing back on various attempts to register AI-generated (as opposed to AI-assisted) works. Canada has not been quite so clear cut in its position, owing to the way in which copyright is registered (almost by default, without examination) in Canada, as I pointed out in this blog post (A Tale of Two Copyrights). While AI generated works have received copyright protection in Canada (Canadian Copyright Registration for my 100 Percent AI-Generated Work), this is more by oversight than design, given the way the Canadian copyright registration system works.

Thirdly, we turn to transparency, a sine qua non if licensing solutions are to be implemented.  If authors don’t know whether their works are being used to train AI algorithms, or can’t easily prove it, licensing will fall flat. CDCE calls for publication of all content ingested into training models, disclosure of any content outputs that contain AI, and design of AI models to prevent generation of illegal or infringing content. This is similar to requirements already under consideration in the EU.

CDCE also makes the important point that it is not just copyright legislation that defends individual and collective rights against the incursions of AI and big AI platforms. While the Copyright Act offers some protection to creators, privacy legislation is important for all citizens. As the UK Information Commissioner has pointed out in a recent report, the legal basis for web-scraping is dependent on (a) not breaching any laws, such as intellectual property or contract laws and (b) conformity with UK privacy laws (the GDPR, or General Data Protection Regulation), where the privacy rights of the individual may override the interests of AI developers, even if data scraping meets other legitimate interest tests.

Finally, there is the question of the moral rights of creators that can be threatened by misapplication of AI, whether it is infringement of a performer’s personality or publicity right, distortion of their performance or creative output, misuse of their works for commercial or political reasons or any of the other reasons why copyright gives the creator the right to authorize use of their work.

Quite apart from the question of AI, there are of course other outstanding copyright questions that need to be resolved urgently, including the longstanding issue of the ill-conceived education “fair dealing” exception that has undermined if not permanently damaged the educational publishing industry in Canada. This exception needs to be narrowed to allow users continued unlicensed access to copyrighted materials under fair dealing guidelines for study, research and educational purposes but to limit institutional use to situations only where a work is not commercially available under a license from a rightsholder or collective society. While this issue requires looking back and fixing something that is already broken, policy making with respect to AI and copyright needs to anticipate the future and “do no harm”, while requiring AI developers to open up their black boxes and respect existing rights. This should be achieved by maintaining and protecting the rights of creators in ways that will facilitate market-based licensing solutions for use of copyrighted content by AI developers, while ensuring that creative output remains the domain of human beings, and not machines.

© Hugh Stephens, 2024.

AI’s Copyright Challenges: Searching for an International Consensus

Image: Shutterstock

This has been a busy couple of weeks for national and international declarations on Artificial Intelligence (AI). First the G7 issued its International Code of Conduct for Advanced AI Systems on October 30.  The same day US President Biden signed the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, followed by the Bletchley Declaration at the conclusion of the “AI Summit” hosted by UK Prime Minister Rishi Sunak in Bletchley Park, Buckinghamshire, a couple of days later. Meanwhile, the EU’s AI Act is being touted by its sponsor as a potential model for AI legislation in other parts of the world (although its enactment is currently bogged down in the trilogue process between the Commission, EU Council and European Parliament). Notable was the fact that the US Executive Order, a wide-ranging framework document covering many aspects of the AI issue, effectively “scooped” the Brits by a day or so, allowing Vice President Kamala Harris to highlight steps the US had just announced when speaking to the press at Bletchley.

The declarations all addressed many of the concerns surrounding AI, ranging from safety and security, fraud and cybersecurity to privacy, equity and civil rights to protecting consumers, supporting workers and promoting innovation. A key issue only lightly touched on in these declarations, however, was that of AI’s intersection with copyright. This was a missed opportunity to come to grips with a major concern regarding how AI will be able to co-exist with copyright law. (The EU’s draft AI Act includes a transparency requirement to  “document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law“, Article 28(b) 4(c).)

AI faces two significant challenges when it comes to copyright protection. First, with respect to the inputs that AI developers use to populate their models to produce generative AI, there is the unresolved question as to whether the free use of copyrighted content violates copyright law by making unauthorized reproductions. There are currently a number of lawsuits underway in the US examining this fundamental question. Many creator groups, such as the News Media Alliance in the United States argue that “the pervasive copying of expressive works to train and fuel generative artificial intelligence systems is copyright infringement and not a fair use”.

Second, with respect to outputs, the work generated by AI has two challenges in terms of obtaining the benefits of copyright protection. If its inputs are infringing, that clearly casts doubt on the legality of the derivative outputs. In addition, there is the problem posed by the current position of the US Copyright Office (and most other copyright authorities) that to be copyright-protected a work must be an original human creation. After the infamous Monkey Selfie case, the USCO issued an interpretive bulletin reiterating the need for human authorship and, to date, it has hewed to this line when examining applications for copyright registration from authors claiming works produced by AI.

The G7 Declaration was broad, covering a wide range of issues related to AI. It included a reference to the copyright issue under Point 11, “Implement appropriate data input measures and protections for personal data and intellectual property”, specifically stating that, “Organizations are encouraged to implement appropriate safeguards, to respect rights related to privacy and intellectual property, including copyright-protected content.” This is hardly prescriptive language, but it is a beginning. I understand that the creative community had to fight hard to get this wording included, but it is at least recognition of the issue.

With respect to the US Administration’s Executive Order, the issue of copyright was also acknowledged, but in a somewhat backhanded way. Section 5.2 (Promoting Innovation), addresses copyright as part of clarifying issues “related to AI and inventorship of patentable subject matter”. Paragraph (c)(iii) declares that the Under Secretary of Commerce for Intellectual Property and Director of the US Patent and Trademark Office shall;

within 270 days of the date of this order or 180 days after the United States Copyright Office of the Library of Congress publishes its forthcoming AI study that will address copyright issues raised by AI, whichever comes later, consult with the Director of the United States Copyright Office and issue recommendations to the President on potential executive actions relating to copyright and AI. The recommendations shall address any copyright and related issues discussed in the United States Copyright Office’s study, including the scope of protection for works produced using AI and the treatment of copyrighted works in AI training.”  

This is not exactly a ringing endorsement of the need for respecting the copyright of those who, willingly or not, provide the raw material for the voracious AI machines that are busy scooping up creator’s content, but it is nonetheless an acknowledgment that there’s an issue that needs addressing.

The US Copyright Office (USCO) launched its study on Artificial Intelligence and Copyright on August 30 of this year “to help assess whether legislative or regulatory steps in this area are warranted”. By the end of October, the USCO had already received more than 10,000 submissions. The comments range from statements by AI developers as to why they shouldn’t be required to pay for copyrighted content used as inputs in developing their models (while of course claiming they should enjoy the benefits of copyright protection for their AI generated outputs), to submissions by creator organizations that argue, among other things, that the ingestion of copyrighted material by AI systems is not categorically fair use and that AI companies should license works they ingest. Licensing their content to AI companies as an additional revenue stream is precisely what major media companies are currently engaged in.

If the US, currently and for the foreseeable future the leading country in development of AI, is thrashing around trying to address this question, one can imagine the process taking place elsewhere. Will the need to set standards inevitably lead to some form of international consensus for the regulation of AI, including the role of copyrighted content? I think it will be essential. Countries that are too lax in protecting their creative sectors will see their copyright-protected cultural industries suffer negative economic consequences; countries that are overly protective of content are worried that investment in AI innovation will flow to countries with lower copyright standards, becoming a race to the bottom for creators.

The UK government has already felt the pinch of this dilemma. In a misguided attempt to gain a head start in the AI development race, about a year and a half ago the British government unveiled a proposal sponsored by the UK Intellectual Property Office (of all entities!) to create an unlimited text and data mining (TDM) exception to copyright, at the same time stripping rights-holders of their ability to license their contact for TDM purposes, or to contract or opt out. In the words of the discussion paper accompanying the draft legislation, in order to reduce the time needed to obtain permission from rightsholders and to eliminate the need to pay license fees;

The Government has decided to introduce a new copyright and database right exception which allows TDM for any purpose …Rights holders will no longer be able to charge for UK licences for TDM and will not be able to contract or opt-out of the exception.”

This outrageous attempted expropriation of intellectual property rights aroused a storm of protest from the UK’s vibrant cultural sector, a backlash that found resonance in Parliament. As a result, the British government backed off, and withdrew the proposed legislation. However, one wonders if the stake has truly been driven through the heart of this hi-tech gambit or whether, like Dracula, this misguided policy will rise again. UK Parliamentary Committee Shoots Down Copyright Exemption for AI Developers–But is it Really Dead”? Certainly, British publishers are not convinced the content grab is over. According to the Guardian, they have just issued a statement urging the UK government, “to help end the unfettered, opaque development of artificial intelligence tools that use copyright-protected works with impunity.”

Canada has just launched a public consultation on AI and Copyright, (”Copyright in the Age of Generative Artificial Intelligence”), and others will be doing the same. In Australia, Google, responding to a review of copyright enforcement, urged the government to relax copyright laws to allow artificial intelligence to mine websites for information across the internet (even though this wasn’t the topic of the enquiry). Meanwhile, the Attorney-General’s Department has been conducting several roundtables to explore the issue, the most recent being at the end of August. In that roundtable, representatives of the Australian creative community called for greater transparency around how copyright material is being used by AI developers during the input training and output process.

And so, the search for the right formula goes on. It will not be easy to find the elusive international consensus, especially since at the moment (with the exception of China) this is an issue on the agenda only of the so-called Global North.

How the heavy-hitters will deal with the issue of AI, including its intellectual property dimensions, remains to be seen. There could be something as relatively powerless as OECD Guidelines that emerge or regulation could go a lot further, including the establishment of some kind of international agency with the “authority” to regulate in the area of AI, as suggested by Elon Musk and others. However, as we have seen with every international organization created to date, whether it be the UN, World Trade Organization, International Atomic Energy Agency, or any of the myriad other supra-national structures created in recent years, the authority granted them is only as good as the commitment of its signatory states. It makes sense to harmonize and set broad international standards for the way in which AI is created and used, but it will be a long road to get there.

The challenge of how copyright can intersect with AI–to the mutual benefit of both–has still be worked out. The courts are playing a role, as is technology, evolving business models, and legislation. Society needs to find the sweet spot where both human creation and technological advancement in the form of AI can co-exist for the benefit of society at large. Despite recent pronouncements, the search continues.

© Hugh Stephens, 2023. All Rights Reserved.

This post has been updated to include reference to the ongoing roundtable process underway in Australia under the aegis of the Attorney-General’s Department to explore, inter alia, questions of AI and copyright.