AI Training – Hugh Stephens Blog

Copyright and AI Training: The UK Rethinks its Blatant Content Giveaway Scheme—But What Comes Next?

A robot holding a British flag with the text 'GIVE ME YOUR CONTENT FOR FREE' and 'OPT OUT IF YOU CAN' displayed prominently.

Image: Shutterstock (modified)

On March 18 the UK’s Labour government finally confirmed what everyone has known for months, that its preferred policy option of allowing AI developers to freely use copyrighted content to train their AI algorithms unless rightsholders specifically opt out, was a shambles. The government had been backpedalling for a while but the issuance of the Parliamentary-mandated Report on Copyright and Artificial Intelligence, all 125 pages of it, finally drew a line under all the prevarications. The paper was issued by the two Departments that represent the opposite ends of the spectrum when it comes to the copyright/AI question, the Department for Science, Innovation and Technology and the Department for Culture, Media and Sport. The inconclusive results reflect this duality.

The government’s initial Consultation Paper had proposed four options to deal with the issue of unauthorized use of copyrighted content to train AI platforms;

Do nothing (status quo). Copyright and related laws would remain as they are. It would be left to the courts to settle disputes between rightsholders and unauthorized users;
Strengthen copyright by requiring licensing in all cases of use of copyrighted content;
Legislate a broad text and data mining (TDM) exception to copyright law. (Britain already has a TDM exception but it is limited to non-commercial research purposes);
Create a data mining exception with opt-out and transparency measures. Rightsholders would be required to notify when their works were not to be accessed, and AI developers would be required to disclose what works they had used.

The Paper indicated that the last option, Option 3, was the government’s preferred choice. Unfortunately–for the government–fully 97% of the 11,520 respondents to the consultation survey disagreed. To put it another way, only 3% of respondents endorsed the government’s preferred option. (Note to UK government: Wipe egg off face). Rightsholders did not want the onus to be on them to opt-out (given that copyright law is based on the user, subject to fair dealing and other statutory exceptions, needing to obtain permission in advance from a rightsholder), and because opting out is not necessarily easy to do. AI developers objected to the transparency provisions. They preferred Option 2, a broad TDM exception allowing them to help themselves to whatever content they deemed useful for AI training without let or hindrance. The last thing the tech industry wants is to be required to document what content it has appropriated without authorization. To oppose Option 3, rightsholders mounted a widespread and very effective campaign to remind legislators of the value of Britain’s creative industries, both economically and culturally. Everyone from those two noble knights, Sir Elton John and Sir Paul McCartney, to Ed Sheeran, Dua Lipa, Kazuo Ishiguro, Andrew Lloyd Webber, Cat Stevens, Sting, Tom Stoppard, and on and on—maybe everyone who is anyone in the cultural world in Blighty, except Banksy, spoke out. Now the government has thrown up its hands and admitted it got it all wrong. The blatant content giveaway cum confiscation is not going to happen, at least not in the form initially proposed.

The Starmer government’s particular slice of humble pie was worded as follows;

We must take the time needed to get this right. We will not introduce reforms to copyright law until we are confident that they will meet our objectives for the economy and UK citizens. This means protecting the UK’s position as a creative powerhouse, while unlocking the extraordinary potential of AI to grow the economy and improve lives. Any reform must ensure that right holders can be fairly rewarded for the economic value their work creates, and that they are protected against unlawful and unfair use of their work. It must also ensure that AI developers can access high quality content. It is clear through the consultation and our subsequent engagement that there is no consensus on how these objectives should be achieved.

Truer words were never spoken. And there will never be consensus as long as one of the parties is an entitled tech industry that feels it can freely confiscate valuable content produced and owned by others, and which has no qualms about accessing pirate websites to do so. Legal consequences are required to bring it to the negotiating table.

Although we now know the UK government will not forge ahead to adopt the Option 3 opt-out proposal, we still really don’t know what it will do. Amidst all the relief from the creative community that Option 3 has been dropped as the chosen way forward, there is still a sense of unease about what might happen instead. Composer Ed Newton-Rex, one of the leading activists amongst the creators, has voiced his concerns on X. While the government has withdrawn its preference for Option 3, it is still on the table as is the possibility of some other form of TDM exception. He is worried that while there is talk of compensation for rightsholders, permission does not seem to be part of the equation. With the dropping (for now at least) of Option 3, transparency requirements for the tech industry are also once again in doubt.

What will happens next on this issue in the UK? More study, more research, more monitoring of developments. In other words, back to Square One. The one new development coming out of the government’s Report is an acknowledgement that something needs to be done with respect to digital replicas. As described in the Report, this involves the use of AI to replicate or mimic the appearance or voice of individuals. Current copyright is not well adapted to protect against such misuse. The result could be the introduction of a right of personality in the UK.

When it comes to the intersection of copyright and AI development, the UK government is trying to resolve an issue with which many countries are grappling. The question is how to get a slice of the AI investment pie by incentivizing the tech industry without throwing the creative community under the bus. In the US, finding the balance has largely been driven by the courts. The White House has just issued its National Policy Statement on Artificial Intelligence. Among its many policy positions, the Statement indicates this question should continue to be left to the courts to decide, even though the White House believes that unauthorized and uncompensated use of copyrighted material for AI development is fair use under US law. In Australia the government looked at the same issue and in the face of vocal and organized opposition from the creative sector, explicitly ruled out introducing a TDM exception. India tried to find the balance by proposing the establishment of an unworkable compulsory licence regime, a scheme that I described in a recent blog post as the “worst of all worlds”. It was as widely condemned by both creators and tech developers as the UK’s Option 3. Canada, which like Australia does not have a TDM exception in its copyright law but which is being pushed by the tech industry to loosen restrictions in the face of current and anticipated lawsuits, is facing similar questions.

While this may seem to some (like the UK government for example) as an almost intractable problem that will require much more work, study and consultation to resolve, in fact it is not all that difficult. Cultural industries generally are not opposed to the development of AI. In fact, many creators use it to assist their work. But they want to have some say over whether or how their work is being used and want to receive compensation when it is used. It is true that some individuals may wish to have nothing to do with AI and would like to see it go away. That, however, is not going to happen although their right to have their work not used to develop applications that may unfairly compete with the content they produce must be respected. The best outcome is a win/win scenario where creators voluntarily opt in to the AI development process and share in its benefits. This is already happening on an increasingly large scale with respect to commercial licensing deals, with news media outlets, studios, music publishers and book publishers all signing licensing agreements with AI developers. Left out at this stage are most individual authors and artists and small publishers, but voluntary collective licensing can fill the gap. As a recent example, at the recent London Book Fair in March of this year, the first stage of an opt-in collective licensing initiative was launched by Publishers Licensing Service to supplement direct agreements between publishers and AI companies.

The “magic formula”–the win/win solution–which so many countries are finding so hard to unlock is based on three principles, well articulated by Canada’s Coalition for the Diversity of Cultural Expression (CDCE);

1. Authorization (Permission)

2. Licensing (Remuneration)

3. Transparency (Disclosure)

Acceptance of these three cardinal principles by AI developers would cut through what the UK government seems to see as a Gordian Knot. The solution is not all that difficult, but it requires courage to stand up the tech industry’s threat of taking their ball and going elsewhere to play. Let’s hope that as the UK and other governments go back to the drawing board, they use these principles as guideposts to arrive at a solution that, at the end of the day, best serves everyone.

AI Training and Nurturing Cultural Industries in Asia: Finding the Right Balance

Text and Data Mining (TDM) Exceptions and Compulsory Licensing Solutions Carry Heavy Risks

A scale balancing two labeled blocks, one marked 'TDM' and the other marked '©', representing the debate between Text and Data Mining and copyright.

Text and data mining (TDM) is a hot topic in many countries. In jurisdictions where exceptions to copyright protection are embedded in legislation rather than determined by the courts on a case-by-case basis (as in the US), TDM has become a favoured vehicle of AI developers, although compulsory licensing has also been floated by some as a potential solution. AI developers see TDM as a loophole allowing access to copyright protected works for algorithm training without payment or permission. Compulsory licencing would establish a statutory regime requiring rights-holders to provide access to their content upon payment by users. While seemingly offering a middle ground, it is fraught with problems. Meanwhile, content industry stakeholders have been vocal on the need to protect their intellectual property, while in some cases resorting to legal action.

TDM has been on the front burner in the UK, Australia and Canada, and Asia is facing many of the same issues. From India to Malaysia to Japan, and from Korea to Hong Kong to Singapore, access to copyrighted content for AI training is front and centre although being played out in different ways. Some countries already have instituted limited TDM exceptions while others are reviewing options. In India, which has long used compulsory licences in the patent field, and which has provision for compulsory licences under certain narrowly specified circumstances in its Copyright Act, both TDM and wider compulsory licensing are being pushed by the AI industry. A common thread in all countries is the concern by rightsholders that their valuable proprietorial content is being or may be taken and reproduced to provide training inputs to a commercial process without authorization or compensation. These concerns are not misplaced.

Compulsory licensing is a “solution” (actually opposed by many in the AI industry who believe that all content should be “free”) that strips away the rights of content owners to determine how their valuable intellectual property will be used. In effect, it is a form of expropriation. While compulsory licences may set a price for use (which may or may not be seen as fair), they don’t address other issues that are normally included in licensing deals such as how the work is to be used, or any specific limitations related to the content. There is also the difficult issue of equitably distributing collected funds.

Voluntary licensing where rights-holders can opt-in is a fairer and more feasible solution, offering mutual benefit to both the content and AI industries. A growing voluntary licensing market exists for print, AV and music content—but AI developers have been slow to respond, a key reason being the mixed signals they are receiving from various governments. Rather than negotiate, the AI industry would rather push for a broad exemption legalizing the practice of helping themselves to protected content owned by others. The pretexts advanced are either a) they are not really copying (just turning content into data tokens is the argument) or, b) if they are, they should be allowed to continue doing so in the name of “innovation”. There is also the implicit threat that if laws and regulations are too protective of the creative sector, AI development funds will go elsewhere, to more compliant jurisdictions.

This argument does not hold water as many factors go into making investment decisions regarding facilities such as data centres, notably the availability and cost of talent, land, power, etc. It is worth noting that while Malaysia does not have a TDM exception in its copyright law (whereas Singapore does), investment is pouring into Johore Bahru–just across the causeway from Singapore–because of Malaysia’s relative competitive advantage in input costs. The AI industry’s “fear factor” threatens to start a race to the bottom as governments around the world don’t want to be left behind as the AI race heats up. While it is clear that AI will transform some industries and has the potential to increase productivity in many areas, it may lead to more job losses than gains whereas the cultural sector is both a key economic driver in all the Asian economies in question and an important pillar of national identity.

India

India is a good case in point. It is a well known cultural and technological powerhouse with a creative economy that was estimated by WIPO to be valued at over $30 billion (USD) in 2023, with 20% growth in creative exports generating over $11 billion. Prime Minister Modi has called on the creative sector to further increase its share of GDP. Yet the TDM issue has raised its head in India, especially after OpenAI was sued by several Indian media entities for copyright infringement. In May Reuters reported that the Ministry of Commerce had set up an expert panel to examine the AI training issue. Both domestic and international content industries in India are concerned that creation of a TDM copyright exception or widening of compulsory licensing in India’s copyright law will undermine the incentive to create new content, and stall the development of a voluntary licensing market for AI training. Careless implementation of TDM or bringing in a misplaced compulsory licensing regime risks throwing out the baby with the bathwater.

Malaysia

As in India, AI industry lobbyists in Malaysia have called for implementation of a TDM exception. The case of neighbouring Singapore is often cited, but Singapore is a particularly poor example to follow. Singapore’s overly-broad TDM exceptions, referred to locally as exceptions to facilitate “computational data analysis”, combined with severe limitations on use of contract law to control access to copyright protected works, have weakened Singapore’s creative sector and held back the development of licensing options. There is no need for introduction of a TDM exception in Malaysia. Kuala Lumpur can distinguish itself by offering an appropriate balance between AI development and fostering important cultural industries, encouraging the development of a mutually beneficial licensing market. Its other attributes have helped it to successfully attract significant high-tech investment without undermining its investment in content creation.

Japan

Japan, which has a TDM exception in its copyright law, is often held out by AI developers as a model for the kind of copyright law they would like to see replicated elsewhere, but the impression that anything goes in Japan with respect to use of copyrighted content is mistaken and based on misunderstandings. As I outlined in a blog post last year (Japan’s Text and Data Mining (TDM) Copyright Exception for AI Training: A Needed and Welcome Clarification from the Responsible Agency), Japan’s TDM exception does not apply if the user of the copyrighted data “enjoys” the content. As an example, this means that if a user derives benefit through using the copied material to create outputs based on the reproduced content, the TDM exception does not apply. As this website succinctly puts it, “Expressive intent invalidates the safe harbour.” As is the case elsewhere, the limits of the law are being tested in court. Yomiuri Shinbun, Japan’s largest paper, as well as Nikkei and Asahi Shinbun, are suing Perplexity AI for copyright infringement in Tokyo District Court. Meanwhile a market for licensing content is beginning to develop.

Korea

Korean content companies are also turning to the courts for redress against unrestricted copying by digital platforms. Korea’s three terrestrial broadcasters, KBS, MBC, and SBS filed suit in January against Korean tech giant Naver claiming the platform used their news content to train its AI application. The broadcasters had earlier put Naver on notice not to use their content without permission. Naver is, broadly speaking, the Korean version of Google. It has recently been reported that more lawsuits are pending against Naver, this time from the Korean Newspaper Association.

Korea does not have a TDM exception in its copyright law, but it has (at least in theory), adopted the US fair use doctrine as a result of the US-Korea Free Trade Agreement. However, although fair use was incorporated into Korean law in 2011, its has seldom been used and the Korean courts have been very reluctant to apply it, and where they have, the application has been very narrow, essentially limited to non-commercial use. To date there have been no fair use cases brought to the Supreme Court, and lower courts tend to rely on the specified exceptions that apply in Korean law. Because of this there have been attempts to introduce a TDM exception, and more are expected in the current National Assembly. Various versions have been proposed that are of concern to rightsholders, including broad interpretations that would not distinguish between commercial and non-commercial use. Korea, one of the cultural giants in Asia, needs to tread carefully if it wants to maintain this leading cultural export, while encouraging development of content licensing.

Hong Kong

Hong Kong does not have a TDM exception in its current copyright law but under pressure from the AI sector is considering the idea. The Intellectual Property Department launched a public consultation late last year, receiving input from stakeholders representing both sides of the argument and has come forth with recommendations to the legislature (Legco). It has proposed a TDM exception for both commercial and non-commercial use but with a number of limitations; 1) access to content must be lawful (i.e. no use of pirated content); 2) a public record must be kept of copyrighted works used in AI training (transparency requirement); 3) the TDM exception will not apply where licensing schemes (i.e. licences that have been issued by the Copyright Tribunal) exist; and 4) rightsholders can reserve their rights by opting out.

There are problems with this proposal, despite the limitations. Requiring rightsholders to opt-out stands the existing basis of copyright on its head, as it has in the EU (i.e. users normally need to obtain permission from rightsholders in advance) while the licensing provision provides limited relief. While not as potentially destructive as some proposed TDM exceptions elsewhere, it is questionable if Hong Kong needs a TDM exception given that voluntary licensing alternatives are increasingly available. At present, the recommendations are with the Legco; given public skepticism about the proposal, legislation is not expected until 2026 at the earliest.

Conclusion

Lawmakers and regulators in Asia are grappling with a common problem; how to incentivize the development of responsible AI while continuing to encourage and promote all-important content industries. Cultural expression is particularly important in Asia as an expression of values, and throwing the cultural sector under the bus in the hopes of attracting some ephemeral hi-tech AI jobs is a false bargain. It’s like eating the seed grain from which the bounty of cultural creativity springs. Undermining the nurturing environment provided by sound copyright protection, whether through compulsory licensing or creation of TDM exceptions, is bad public policy.

Strong cultural industries enable the development of strong content licensing markets for AI development, enabling a virtuous circle of further creativity. A strong cultural sector and strong, sustainable digital industries, especially those powered by AI, go hand-in-hand. Asian regulators need to exercise prudence and weigh the consequences of rash action. The winners will be those that find the right balance between encouraging innovation and fostering creativity.

Australia Stands Up for its Creative Sector: A Useful Lesson for Canada and Others

Two coffee mugs side by side, one featuring the Australian flag and the other featuring the Canadian flag.

Image: Shutterstock

Australia just took an important stand in the tug-of-war being waged in many countries over whether, how and to what extent tech companies can use copyrighted content (text, music, images and so on) to train AI platforms by reproducing the content and extracting its essence without permission or compensation to rightsholders. Attorney-General Michelle Rowland has announced that while Australia will be undertaking consultations on revisions to its copyright laws to help address the needs of the AI industry, a Text and Data Mining (TDM) exception has been ruled out. Some countries, like the UK, have TDM exceptions for limited purposes (such as research and non-commercial use) in their laws while several other countries have TDM under review. Existing TDM exceptions allow reproduction of copyrighted content without the authorization of the rightsholder for research, data analysis, and in some cases for AI training purposes.

There is currently no TDM exception in Canadian law but as I noted in a recent blog post (“Canada’s Creative Sector Uneasily Awaits the Carney Government’s Next Steps on AI Training”), pressure is building from the AI sector to incorporate TDM into Canada’s Copyright Act. The government currently has yet another consultation paper on AI out for public comment and the Canadian cultural sector is organizing to protect creator’s rights, specifically calling on the Canadian government to “ensure that the Copyright Act is not modified through an exception permitting Text and Data Mining (TDM) or any other exception allowing technology developers or users to use protected works…to train generative AI systems without authorization or compensation…”. In doing so, it is taking a leaf from the book of Australian creators who mounted strong opposition to a proposal from the Productivity Commission, (PC) an independent research and advisory body created by an Act of Parliament some 25 years ago, that proposed in a report in August that Australia adopt a TDM exception. To say that this proposal put the cat amongst the pigeons would be an understatement.

The Commission has a reputation for denigrating the value of intellectual property and seeing it as an obstacle to industrial development rather than as an essential partner. In 2015 it proposed shortening the term of copyright protection from the current life of the author plus seventy years (“life plus 70”), a generally accepted international standard, to just “life plus 15”, (far lower than the Berne Convention minimum and a standard not adopted anywhere) while introducing a US-style fair use regime into Australia. There was strong pushback then, (it didn’t happen) and there was strong pushback this year (see here and here, for example) when the PC proposed introducing a TDM exception. It was particularly criticized for its lack of consultation with the creative industries in developing this proposal.

Now the Australian government has put its foot down, ruling out TDM but indicating that it will look at alternative solutions. These include examining whether to establish a new “paid collective licensing framework” under the Copyright Act for AI, or whether to maintain the status quo through voluntary licensing, clarifying how copyright law applies to material generated through the use of AI (i.e. whether there should be copyright protection for outputs produced by or with AI) and looking at the establishment of a new small claims forum to address lower-value copyright infringement matters.

It is generally accepted that AI is here to stay and will continue to need vast amounts of content for training. In most cases, copyrighted content is the kind of curated, high value work that AI developers need but until now, have preferred to appropriate without permission rather than pay for through licensing. In effect they have decided to ask for forgiveness after rather than permission beforehand. This has led to a plethora of lawsuits globally, including the recent $1.5 billion settlement that Anthropic has agreed to pay out to settle a class action suit brought by authors in the US. “Forgiveness” can be expensive. Inside the US, AI developers are arguing their copying is fair use, although at the same time they are beginning to hedge their bets by licensing content from a number of sources, ranging from media to music to image companies. Outside the US, AI companies have been beating the TDM drum, hoping that creation of wide TDM exceptions will obviate the need to negotiate with content owners. Nonetheless, voluntary licensing is growing globally. However, the surest way to kill a nascent licensing market is to give the tech industry a “get out of jail free” card by introducing a broad TDM exception. Australia has just rejected that option. Canada and others considering introducing new, or broadening existing, TDM loopholes should do the same.

It is not clear where Australia’s AI and Copyright review will end up, other than to note that it will not include TDM. As I have noted above, among other things it will be considering “collective licensing”. Collective licensing could help address the problem of remunerating individual rightsholders, in contrast to licence agreements signed between AI developers and corporate entities like media companies. However, Australia needs to steer clear of compulsory licensing which strips away the rights of copyright owners. Compulsory licences authorize use upon payment of a statutory or negotiated fee but remove the right of a copyright holder to withhold consent for use, or to impose specific limitations. A voluntary licence framework is fair to everyone. Compulsory licensing is not.

Canada and Australia have many things in common, (as well as a number of differences of course, beyond poutine vs vegemite). Among their commonalities is the desire to protect and foster a unique cultural identity in the face of global cultural homogenization. This is even more important in Canada given the realities of the struggle faced by 6 or 7 million Francophones to preserve their culture in a sea of 375 million Anglophones. Canada followed Australia’s lead (although less successfully) in requiring major online platforms to contribute financially to (i.e. pay for the use of) news media content. It should do the same by putting the idea of a TDM exception firmly to one side and instead focus on encouraging the development of voluntary licensing market for copyrighted content when used in AI training.

Hold the Champagne: The Two AI Training/Copyright Decisions Released in the US Last Week Were a Mixed Bag for AI Developers

Illustration of a champagne bottle being popped, enclosed in a red circle with a slash indicating 'no champagne'.

Image: Shutterstock.com

Last week I wrote about the questionable ethics of META’s use of pirated content to train its AI model, Llama, pointing out the ethical issues involved with META’s admitted use of pirated online libraries, such as LibGen (Library Genesis), to feed content to Llama for training purposes. This is quite apart from whatever legal issues that may arise from the widespread practice of ingesting copyrighted content for AI training by making an unauthorized copy from any source (such as a legitimate library, through purchase of a single copy of a work, or from publicly available internet sources, for example) not to mention the additional element of taking that content from pirate sources. The day after that blog was posted the first of what will be a series of legal decisions in the US regarding cases brought by authors and copyright holders against AI companies was issued, followed by another a day later. Both cases were heard in the Northern District of California, in the same San Franciso court house, but handled by different judges.

I updated last week’s blog to make reference to the Bartz v Anthropic case (hereafter “Anthropic”), but given the importance of that decision, combined with a decision released in another California court room a day later (Kadrey et al v META), these cases merit further exploration–especially since they were widely trumpeted by AI advocates as opening the door to unauthorized use of copyrighted content for AI training on the basis of “fair use”.

Fair use is the complex legal doctrine used in the US to determine exceptions to copyright protection. US readers are well aware of the intricacies and idiosyncrasies of fair use but for those not overly familiar with how it works, here is a short summation I drew from a blog post on fair use vs fair dealing that I wrote a few years ago.

In the US context, fair use is an affirmative defence against copyright infringement and is determined by the courts on a case by case basis, judged against several fairness factors (purpose and character of the use, the nature of the work copied, the amount and substantiality of the amount of the work used, and the effect of the use on the value of the original work)… Fair use is not defined by law. Some examples are given in US law of areas where the use is likely to be fair (criticism, comment, news reporting, teaching, scholarship, research) but these are illustrative and not exhaustive. In short, it is the courts that decide. This in turn can lead to extensive litigation as to what is and is not fair use, and it is worth noting that different judicial circuits in the US have at times come up with conflicting interpretations.

Or, for that matter, two different judges in the same circuit delivering decisions just days apart on similar issues but with some significantly different outcomes, as we saw last week (although in these cases both found fair use by AI developers with regard to the copyrighted works at issue).

On the Anthropic case, US District Judge William Alsup ruled, on summary judgement, that the use of copyrighted works for AI training, even though done without authorization, is highly transformative and does not substitute for the original work (“The technology at issue was among the most transformative many of us will see in our lifetimes”). It thus qualifies, according to Alsup, as fair use because the transformative nature of the use overrides or swallows the three other fair use factors, including the important fourth factor (effect of the use on the value of the work). He notes there was no allegation that the output of Anthropic’s model, known as “Claude”, produced content infringing the works of the plaintiffs. However, Judge Alsup then went on to consider the legality of Anthropic’s actions to download more than 7 million works from pirate libraries (such as Books3, Library Genesis and the Pirate Library Mirror) to constitute its reference library, which it initially planned to use for AI training. He concluded this was a prima facie case of copyright infringement, whether Anthropic intended to use some or all of the pirated works to train Claude or not. (“Anthropic seems to believe that because some of the works it copied were sometimes used in training LLMs (Large Language Models), Anthropic was entitled to take for free all the works in the world and keep them forever with no further accounting “.) Damages, to be decided at trial, could be substantial. Alsop did not, however, rule explicitly on whether or not the use of pirated works for AI training purposes could be a fair use.

Because of the controversial nature of Alsup’s findings on transformation and fair use, there is no question that this case will be appealed. While there have been many criticisms of the fair use elements of Alsup’s ruling, a particularly clear and trenchant analysis was put forth by Kevin Madigan of the Copyright Alliance (Fair Use Decision Fumbles Training Analysis but Sends Clear Piracy Message).

The second case last week to reach the decision stage was Kadrey et al v META. In this case District Judge Vince Chhabria found that META’s use of the works of the plaintiffs, thirteen noted fiction writers, to train its AI model (“Llama”) was also fair use. Chhabria, like Alsup, found that META’s use was transformative on the first fairness factor dealing with the purpose and character of the use (“There is no serious question that Meta’s use of the plaintiffs’ books had a “further purpose” and “different character” than the books—that it was highly transformative.”) but unlike Alsup, Chhabria put much greater emphasis on market harm, (the fourth fairness factor dealing with the effect of use on the value of the work) suggesting that it could be determinative. Unfortunately for the plaintiffs, however, Chhabria considered their arguments with respect to market harm to be unconvincing. There was no evidence that Llama’s output reproduced their works in any substantial way or substituted for the specific works at play nor was there evidence, according to the judge, that the unauthorized copying deprived the authors of licensing opportunities.

Chhabria suggested that a far more cogent argument would have been that use (unauthorized reproduction) of copyrighted books to train a Large Language Model might harm the market for those works by enabling the rapid generation of countless similar works that compete with the originals, even if the works themselves are not infringing. In other words, causing indirect substitution for the works rather than direct substitution. This is the theory of “market dilution”, which was also put forward speculatively by the US Copyright Office in its recent Pre-Publication Report on AI and copyright. Since this wasn’t presented as an argument, Chhabria could not rule on it but in effect he is inviting future litigants to pursue this line of argument, noting that his decision on fair use relates only to the works of the thirteen authors who brought the case.

The clearest way to illustrate his line of reasoning is to quote directly,

“In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use. No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books”.

This editorializing, known in legal circles as obiter dicta, is not binding nor precedential, yet will undoubtedly have some influence given Chhabria’s stature. It is likely that one of these days Judge Chhabria will have the opportunity to put these theories into practice when ruling on a similar case, but one where the plaintiffs have made a better case for market harm. He has provided them a roadmap.

While these two cases have fired the first shots in what is going to be a lengthy war, they do not seem to be dispositive. There are enough caveats and nuances to be able to conclude that the AI developers are far from being out of the woods. Both “victories” have a sting in their tail, especially Judge Alsup’s finding on piracy. Neither copyright advocates nor AI developers should be breaking out the champagne just yet. But whichever way it turns out, there will be some sure winners; the lawyers for each side.

Tag: AI Training

Copyright and AI Training: The UK Rethinks its Blatant Content Giveaway Scheme—But What Comes Next?

Like this:

AI Training and Nurturing Cultural Industries in Asia: Finding the Right Balance

Like this:

Australia Stands Up for its Creative Sector: A Useful Lesson for Canada and Others

Like this:

Hold the Champagne: The Two AI Training/Copyright Decisions Released in the US Last Week Were a Mixed Bag for AI Developers

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this: