Copyright Litigation in China: Some Interesting AI-Related Decisions from Chinese Courts

A wooden gavel resting on a circular base in front of a red backdrop featuring the flag of China.

Image: Shutterstock

These days just about any information in North America related to China, especially regarding intellectual property (IP), is highly negative. The narrative is along the lines of “China is an adversary with deliberately lax IP laws who has stolen and continues to steal our IP, etc.”. This characterization of China is reinforced by our political leaders (When asked during the Leaders’ debate what was the greatest security threat to Canada, Prime Minister Carney replied with one word. “China”). Donald Trump continues to have an obsession with China, the latest manifestation of which is the recent announcement that the US will revoke the visa status of an undetermined number of Chinese students currently studying in the US. (Over a quarter of a million students from China are currently studying at American colleges and universities, many simply seeking an alternative to studying in the hyper-competitive environment at home). The “China as IP thief” narrative is supported by government publications such as the annual Special 301 Report produced by the Office of the US Trade Representative (USTR) which this year had ten full pages on China. One excerpt will suffice to give you the flavour of the report. “In 2024, the pace of reforms in China aimed at addressing intellectual property (IP) protection and enforcement remained slow…Concerns remain about longstanding issues, including technology transfer, trade secrets, counterfeiting, online piracy, copyright law, patent and related policies, bad faith trademarks, and geographical indications.” Well, that covers the waterfront. One wonders how Chinese brands, innovators and creators manage to survive in such an environment.

This is not to dismiss the darker side of China’s long IP history. Have there been cases of industrial espionage involving China? Yes, certainly. There have also reportedly been more than 1200 intellectual property theft lawsuits brought by US companies against Chinese entities in either the US or China over the past 25 years. There is no question that IP protection in China is not all it could and should be, or that some Chinese companies and other entities have been aggressive in seeking to acquire IP by less than transparent means. But that is not the whole story. While the number of IP infringement lawsuits against Chinese entities over the years sounds like a lot, this business website estimates that the number of IP litigation cases globally totals around 12,000 annually. There are several thousand patent litigation cases alone in the US each year. A lot of US companies sue other US companies in the patent, trademark and copyright field. And Chinese companies sue Chinese companies.

In the past, Chinese IP laws had loopholes, were often weakly enforced and were dealt with by courts that had scant knowledge and training in IP matters. That is rapidly changing as China not only climbs the innovation ladder, but has come to dominate it in some areas, such as EV’s and EV batteries, cashless payment systems, renewable energy and others. It is rapidly catching up in generative AI. While this has been happening, Chinese courts have been producing some interesting and increasingly sophisticated decisions when it comes to AI and copyright. China–like other countries–is grappling with several aspects of this issue. There is the question of finding the right balance between protecting creators and innovators while using domestic creative works to spur AI training, development and research. Another element is the extent to which AI assisted or created works qualify for copyright protection. There is currently no Text and Data Mining (TDM) exception in Chinese law to allow AI training on copyrighted content nor is there a definitive interpretation as to whether content produced by AI can be protected by copyright. However, several court decisions, which we examine below, have shed some light on this complex question.

Dreamwriter Case

In one of the earlier cases, which I wrote about back in 2020, (the Dreamwriter case), a Chinese court (in Shenzhen) ruled that an automated article written by an AI program (Dreamwriter), created by Tencent, which had been copied and published without permission by another Chinese company, Yinxun, was nevertheless subject to copyright protection because it met the originality test through the involvement of a creative group of editors. These people had performed a number of functions to direct the program, such as arranging the data input and format, selecting templates for the structure of the article, and training the algorithm model. The article was ruled to be a protectable work, and Yinxun was found to have infringed.

Li v Liu Case

The relatively loose interpretation regarding the degree of human engagement required to protect the output of an AI program in the Dreamwriter case has been supported by other Chinese courts. In the prominent Li v Liu case, the Beijing Internet Court ruled that Mr. Li, who had created the image of a young woman using the AI program Stable Diffusion, had provided “significant intellectual input and personalized expression” in creating the image through a series of prompts. As explained in detail by this article from Technollama, the prompts (along with a number of negative prompts) were sufficient for the court to decide that Li had met the standard of creative expression.

These were Li’s prompts;

“ultra-photorealistic: 1.3), extremely high quality highdetail RAW color photo, in locations, Japan idol, highly detailed symmetrical attractive face, angular symmetrical face, perfect skin, skin pores, dreamy black eyes, reddish-brown plaits hairs, uniform, long legs, thighhighs, soft focus, (film grain, vivid colors, Film emulation, kodak gold portra 100, 35mm, canon50 f1,2), Lens Flare, Golden Hour, HD, Cinematic, Beautiful Dynamic Lighting”

Liu, who had been sued by Li for using the AI generated image without authorization, was found liable for infringement and fined 500 CNY (about USD75).

At that time (late 2023), this decision was considered ground-breaking for image-based works given the position of the US Copyright Office (USCO). USCO had denied copyright registration to several generative-AI created image works owing to insufficient human creativity. (see If AI Tramples Copyright During its Training and Development, Should AI’s Output Benefit from Copyright Protection? Part One: Stephen Thaler and Part Two: Jason Allen). Since then (in January of this year) the USCO has taken a more nuanced position, permitting registration of an AI assisted work (an image called A Single Piece of American Cheese, created by graphic artist Kent Kiersey). Although Kiersey used InvokeAI to create the work, in the view of the US Copyright Office, sufficient human creativity was involved through the “selection, coordination, and arrangement of material generated by artificial intelligence”.

Plastic Chair Case

If China has been in the forefront of acknowledging that human control over AI tools used to generate content qualifies the works for copyright protection, a more recent case has reset the pendulum somewhat. As recounted in this blog by UK-based market research firm IAM, very recently a court in Jiangsu Province dismissed a copyright infringement claim brought by a designer against a company that manufactured, without a licence, children’s plastic chairs based on her AI-based designs. The designer, Feng Runjuan, had created three designs using the AI program Midjourney and posted them to social media, including the prompt she had used. Her prompt was “Children’s chair with jelly texture, shape of cute pink butterfly, glass texture, light background“. The company manufacturing the chairs approached Feng to license the designs but was unable to reach an agreement with her. They then went ahead anyway (without a licence) to produce chairs that bore some similarity to the original designs, using Feng’s original prompt with some tweaks. Feng sued. There was little doubt that the chair manufacturing company had used her prompts to produce the chair design, but the key question was whether the AI generated designs qualified as original works meriting copyright protection.

Feng was unable to reproduce the original images using her prompts owing to the randomness of the AI program. This suggested to the court that it was the AI program making the design decisions, not the person providing the prompts. As outlined in the IAM article referenced above, the court held that a user must provide a verifiable creative process that shows the:

  • adjustment, selection and embellishment of the original images by adding prompts and changing parameters; and
  • deliberate, individualised choices and substantial intellectual input over the visual expression elements, such as layout, proportion, perspective, arrangement, colour and lines.

It concluded that the original images did not qualify as original works and thus they could not be protected. Feng’s lawsuit failed.

So now we have a situation where one Chinese court has ruled that the prompts generated by Li in what I will call the “young girl image” case constituted sufficient intellectual input and personalized expression to qualify for copyright protection, even though the actual image was generated by an AI program, whereas another court has denied copyright protection for a work also produced with prompts, albeit simpler and far fewer. The difference seems to be the degree of human involvement in creating the prompts, although the fact that Ms. Feng in the plastic chair case was unable to reproduce the original images seems to have also weighed against her. As anyone who has ever used an AI program will know, identical prompts will produce different images owing to the way the program works. Does that disqualify the artist? I would hope not, but the degree of control is clearly a key factor, as both the rulings of Chinese courts and the recent USCO decision to register the work A Single Piece of American Cheese would seem to show. Both Chinese court decisions are defensible, demonstrating careful and reasoned consideration, and are helpful in establishing parameters for use in determining whether works are AI assisted or AI created.

Ultraman Case

Another area where Chinese courts have left their mark is on the topic of AI liability for copyright infringement. In what is known as the “Ultraman” case, a Chinese court (the Guangzhou Internet Court, upheld on appeal by the Intermediate Peoples’ Court in Hangzhou) delivered a ruling of contributory infringement against a company that provided AI generated text-picture services through its website. The complainant was the Chinese licensee of the Japanese company that owns the rights to the cartoon character Ultraman. When the defendant’s website (effectively a chat-bot capable of generating AI images at its users’ request) was asked to generate an Ultraman-related image, it generated a character that appeared to be substantially similar to the claimant’s licensed Ultraman. The court had to decide whether the defendant had infringed the plaintiff’s reproduction and derivative production rights and if so, what remedies were applicable.

In its ruling the court decided that even though the defendant did not directly infringe the licensee’s rights, its failure to exercise a reasonable duty of care to prevent infringements (for example, by cautioning users or providing adequate filtering or blocking mechanisms), rendered it liable for contributory infringement. It was ordered to compensate the claimant the amount of CNY 10,000, about USD1500 (considerably less than the damages sought of CNY300,000). Here we have another sophisticated and well reasoned decision, which appears to have been the first instance globally of recognizing the liability of an AI platform for contributory copyright infringement. It does not create any legal precedents but is a useful contribution to the emerging debate.

These cases well illustrate the growing sophistication and complexity of IP rulings in China and are reflective, in my view, of an economy that is rapidly moving up the innovation and creativity ladder. When it comes to IP protection in China, is the glass half empty or half full? I would argue the latter, even though this may not be the most popular interpretation these days. One thing that I am willing to predict with certainty is that we can expect more interesting and thoughtful IP legal decisions from the Chinese legal system in the months and years ahead.

© Hugh Stephens, 2025. All Rights Reserved.

How will Mark Carney’s Cabinet Appointments Impact Canada’s Cultural and Copyright Industries?

Group photo of Prime Minister Mark Carney's new Cabinet, featuring members seated and standing in a formal setting with a colorful mural in the background.

Image credit: Reuters

As Canada’s cultural community assesses the make-up of Prime Minister Mark Carney’s new Cabinet, two common phrases come to mind, “Hope Springs Eternel” and “Grasping at Straws”. The cultural and copyright industries have a number of legitimate concerns, which were well articulated in a pre-election brief published by the Coalition for the Diversity of Cultural Expressions (CDCE), as I outlined in an earlier blog post. Among the CDCE’s requests in the area of copyright was a plea for fair remuneration for writers and publishers for the use of their works in the education sector. There was yet another reminder of the need for the long-promised establishment of an Artists Resale Right in Canada, along with a reiteration of the request to extend the private copying regime to electronic devices (as is done in Europe), a measure that would help in restoring royalties in the music sector. The final copyright ask was to amend the definition of a sound recording to ensure that performers and record labels receive compensation for the audiovisual use of their works. CDCE’s brief also argued for “proper implementation” of the Online Streaming Act so that streaming services and social media platforms contribute financially to the production of Canadian content as well as support for retention of–and increased funding to–the CBC. There were also policy positions regarding the use of copyrighted content for AI training purposes.

What are the chances that all or any of these requests will be addressed by the new Carney government? One might think that with all the hoopla during the election about Canadian identity in the face of repeated attacks on Canada’s integrity by Donald Trump, there will likely be some significant measures taken to strengthen Canadian culture, although the Liberal election platform does not have much to say on Canadian cultural enterprises–other than the CBC. The CBC gets some love as a “cornerstone of our national identity” and will not only not be defunded, as threatened by the opposition Conservatives, but will get an additional $150 million cash infusion as well as having its funding put on a statutory basis. As a quid pro quo the national broadcaster will have to strengthen accountability and its commitment to local news, promote Canadian culture and combat disinformation. Increased funding (we don’t know how much) will be provided to the Canada Council for the Arts, Telefilm Canada, the Canadian Media Fund and the National Film Board. That is all good stuff, but there is not a whisper of copyright reform or of changes to copyright legislation in the platform.

The lack of any overt reference to copyright issues is probably understandable given the focus of the election, which was on standing up to Donald Trump and building the Canadian economy. It is not a bread-and-butter issue for most voters, important as it is to the cultural community. Therefore, we have to look beyond the Party Platform to the roster of appointed ministers and examine their backgrounds to see if there is any prospect of progress. On this score, the story is a bit more positive.

The two key ministers, the Minister of Industry, Science and Economic Development (which holds the statutory mandate for administering the Copyright Act), Mélanie Joly, and the new Minister of Canadian Identity and Culture, formerly labelled Canadian Heritage, (a ministry that has an important though subsidiary role to play on copyright), Steven Guilbeault, both represent Quebec ridings. This is significant given the importance of support for culture and cultural enterprises in the province and the influence of the creative community there.  Guilbeault is also Carney’s “Quebec lieutenant”, which gives him extra influence. Joly, most recently Foreign Minister, is a former Minister of Canadian Heritage herself, as was Guilbeault in a previous incarnation. Thus, they know the cultural files. The CDCE was quick to congratulate both on their election, noting Joly’s appointment was a “promising signal” with respect to copyright.

Maybe. But it is unlikely that either Joly or Guilbeault will pick up the copyright ball unless they are pushed to do it. If they do, given the minority government status of the Liberals, they will need the support of another party. The Bloc Quebecois, with 22 seats(or 23, depending on what happens in the riding of Terrebonne where the Liberals won by exactly one vote out of almost 50,000 votes cast), would be logical supporters for the kind of changes to the Copyright Act sought by Canadian creators represented by the CDCE.

While having familiarity with cultural industry issues from having spent time in the past as Canadian Heritage minister is a plus, the reality is that while both Joly and Guilbeault are strong ministers, both struggled during their previous tenures at Heritage. Guilbeault’s first love is the environment, a role he held for 4 years under Justin Trudeau as Minister of Environment and Climate Change, making him Public Enemy No. 1 for Alberta Premier Danielle Smith. Her new public enemy is the current Environment Minister, Julie Dabrusin, who served briefly as Guilbeault’s Parliamentary Secretary at Environment.  Dabrusin would have been an inspired choice for Minister of Canadian Identity and Culture given the role she played as Chair of the Standing Committee on Canadian Heritage that produced the report “Shifting Paradigms”. Among other things, her Committee recommended changes to the Copyright Act to narrow the problematic education fair dealing exemption that has done so much damage to writers and publishers in Canada. But it was not to be. However, she will do just fine as Minister of the Environment. As for being attacked by Danielle Smith, that is probably a badge of honour for any environment minister. The only scenario under which Smith would not attack a federal environment minister is if Carney pulled a page from Donald Trump’s playbook and appointed someone with such an anti-environment track record (like current EPA Administrator Lee Zeldin) as to effectively disqualify them from the job. But I digress.  

Joly and Guilbeault are not the only ministers who will play on copyright issues. Among the new ministers announced on May 13 was Evan Solomon, the Minister for Artificial Intelligence and Digital Innovation. This is a new Ministry (having an Minister for AI may be a world first) and it is not altogether clear what Solomon’s mandate will be, as pointed out by Michael Geist. This is especially true as he does not have a functioning department to inherit. There are lots of issues for him to resolve, including the salient one of how and on what conditions AI developers will have access to copyrighted content for training AI algorithms. Canada has no text and data mining exception in its copyright law, let alone a broad exemption for AI training such as AI developers are seeking in the US and elsewhere. Some Canadian AI developers have been quick to use the pretext that AI development will flee Canada if they are not given free and unfettered access to creative content of others, a self-serving scare tactic if there ever was one, as I wrote about here. The fact is all countries are wrestling with the issue of how to protect valuable cultural industries while enabling responsible AI to develop. Licensing is the most obvious solution.

The CDCE had three requests with respect to AI training.

  • no Copyright Act amendment to allow technology development companies to continue using protected works, productions, and performances to train generative AI systems without authorization or compensation
  • implementation of legally binding measures requiring the disclosure of training data used in AI systems, and
  • ensuring that all AI-generated content is clearly identified, so that the public is fully informed about the nature of the content it consumes

Solomon no doubt will become involved in these questions. His background is as a journalist, a prominent one at that. As such, one might surmise that he has some understanding of the role of content creators and the need to foster and protect creative expression. But he will be subjected to lots of attention from the tech community, so we will just have to see how it plays out.

When I read all these tea leaves, I have the uneasy feeling that the times are not particularly propitious for the kind of political leadership sought by the cultural industries in Canada, particularly those sectors needing some attention to the copyright file. However, as I noted at the outset, “hope springs eternal”. And if hope fades, one can always “grasp at straws”. There are quite a few of them lying around. But are there enough to build any kind of useful structure? That is the question.

© Hugh Stephens, 2025. All Rights Reserved.

Should We Throw Copyright Under the Bus to Compete with China on AI?

An illustration depicting a stick figure running away from a bus labeled 'AI,' while another figure labeled 'C' appears to have been hit or is lying on the ground.

Image: Shutterstock (author modified)

If this sounds about as responsible as “we should legalize theft of patents at home because patent infringement is rife in China”, then you may well ask where such a nonsensical and counterproductive idea came from. From OpenAI, the company behind ChatGPT, for one, the same company being sued by the New York Times for copyright infringement for copying and using NYT content without permission to train its AI algorithms.

Sam Altman, CEO of OpenAI, is one of the “tech bro’s” now cozying up to Donald Trump. He is a vocal advocate of allowing the AI industry unfettered access to copyrighted content as part of the AI training process. Last year, in a submission to the UK Parliament OpenAI claimed that it would be “impossible” to train AI without resort to content protected by copyright. Now, it maintains that allowing AI companies to scoop up copyrighted content without authorization or payment is not only “fair use”, a legally unproven proposition that is currently very much a live issue before the courts in the US and elsewhere, but is essential for “national security”. To cite a few choice tidbits from OpenAI’s submission to the Office of Science and Technology Policy (OSTP) filed in response to the Office’s request for submissions on the Trump Administration’s AI Action Plan;

Applying the fair use doctrine to AI is not only a matter of American competitiveness—it’s a matter of national security… If the PRC’s developers have unfettered access to data and American companies are left without fair use access, the race for AI is effectively over… access to more data from the widest possible range of sources will ensure more access to more powerful innovations that deliver even more knowledge.”

And, one could add, more profit for AI companies.

In other words, if the US government doesn’t give AI companies free and unfettered access to whatever content it desires, regardless of whether it is protected by copyright (think curated news content, musical compositions and artistic works, not to mention the published works of countless authors), then China will win the AI race, threatening the national security of the US. Or so Altman’s argument goes.

The AI industry is already a practitioner of the art of helping themselves to OPC (other peoples’ content) without permission, then claiming fair use when they are caught doing it. That is what has led to the multiplicity of lawsuits now before the courts, brought by various authors and content owners. Raising the bogeyman of China and wrapping themselves in the flag by invoking “national security”, is a new wrinkle in the attempts by the tech industry to undermine established copyright law and to wriggle out from under their legal obligations.

“National security” is a convenient catchphrase and pretext in common use today to try to justify and legalize the unjustifiable and the illegal. Donald Trump invoked national security when he used the International Economic Emergency Powers Act (IEEPA) to override USMCA/CUSMA obligations made to Canada and Mexico, treaty obligations that he himself signed in his first term in office. The immediate excuse was the flow of fentanyl across the northern and southern borders of the US. Never mind that the amount of fentanyl seized by US border agents at the Canadian border came to a grand total of less than 43 lbs. for all of 2024, or just 0.2% of the total. (The equivalent for Mexico was 21,148 lbs). National security, and in particular playing the China card, is a political winner these days in Washington.

OpenAI’s position is all the more outrageous because it went into fits when the Chinese startup, DeepSeek, launched its new and much cheaper product, allegedly having used OpenAI’s capabilities to improve its own model. OpenAI cried foul and IP infringement, a case of blatant hypocrisy if there ever was one.

OpenAI and other generative AI companies that have built their training model on permissionless copying are clearly nervous about the possible outcomes of the numerous court challenges to its practices currently underway. Most of these cases are in the US although similar lawsuits have been launched in the UK, Canada, India and Germany. While it is impossible to predict the outcome of specific cases, in a recent decision (Westlaw v Ross), a US court rejected fair use as a defence in the context of AI training data. It did not accept that copying the content was a transformative use, but rather one that created a product that competed in the market with the original source material. Given the legal uncertainties, it looks like the tech industry is trying to hedge its bets by lobbying to have all AI training uses declared to be “fair use” based on national security considerations.

It gets worse than that. Another of the tech bro’s, Mark Zuckerberg, gave the green light to training of META’s AI model on pirated material. This was not accidental. Employees reported removing © marks from books downloaded as training materials.

In Canada, in a similar search for a rationale to explain away copyright infringement, a company that was helping itself to copyright-protected curated legal case data to build an AI based legal reference service, claimed that forcing it to license the content would stifle innovation and drive AI businesses out of the country. See CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics (But Don’t Panic, Canada). The AI developers’ strategy seems to be that if you don’t want to license and pay for IP protected content, (or perhaps the owner of the content prefers not to license it, as is their right) just take it and claim some overriding purpose, like protecting domestic innovation or national security.

But what about the argument that if China doesn’t respect intellectual property (IP), we need to adopt the same approach in order to compete? While Chinese courts in recent years have taken a much more robust position with respect to protecting the rights of IP owners, including patents, trademark and copyright, I am not going to argue that suddenly China has become a “rule of law” country. Rather, it is a “rule by law” state, the law being whatever the leadership of the Chinese Communist Party (CCP) decides it will be at any given moment. This is a fact. However, to suggest that the West, in particular the US, should adopt China’s legal modus operandi so as not to lose the so-called “AI race” not only undermines all the values and principles on which our society is based, including the principles of private property, fairness and transparency, but also dismisses three centuries of legal developments in the protection of IP, especially copyright. The evolution of copyright law has resulted in the creation of industries that contribute far more to the economic and cultural wellbeing of our society than any of the questionable outputs of the AI industry.

Yes, AI is here to stay. It can be put to beneficial or nefarious uses and has an undoubted strategic component. It can also be used to undermine and weaken human creativity. Is that the goal we are seeking?

It is worth noting that the tech bro’s have an easy and legal way out. In most instances, they can acquire access to the content they need legitimately. A market for licensing training data for AI development already exists and is further developing rapidly, as I wrote about earlier. Using Copyrighted Content to Train AI: Can Licensing Bridge the Gap? But just taking it and claiming “fair use” is easier and cheaper. And morally and probably legally wrong.

We have seen a lot of rogue policy making in Washington of late, from the illegal deportation of US residents, to the gutting of US government agencies, to the declaration of a tariff war against the world. It is time to take a more considered approach. Rash decisions in response to tech lobbying could lead to untold consequences and collateral damage to content industries that would be impossible to roll back and remedy. Thus, I was relieved to note that Michael Kratsios, Director of the US Office of Science and Technology Policy, the same OSTP to which OpenAI submitted its comments regarding AI training and national security, stated in a recent speech on American innovation that;

 “…promoting America’s technological leadership goes hand in hand with a threefold strategy for protecting that position from foreign rivals. First, we must safeguard U.S. intellectual property and take seriously American research security…”

That is a welcome recognition of the importance of IP as part of the process of innovation.

In this respect, the existing framework of copyright law has survived and adapted for over 300 hundred years. It has evolved with each new technological development, but the fundamental principle of giving an “author” of an original work the right to control how that work is used as well as the ability to earn a return from its use for a statutory period, with only limited exceptions, has remained unchanged. To undermine this principle in a flawed attempt to grasp the Holy Grail of AI leadership is self-defeating. Instead of sipping from AI’s Holy Grail we will be drinking from the poisoned chalice of IP theft.

Throwing copyright and the rule of law under the bus on the pretext that this is what’s needed to compete with China is not only self-serving, it is a sure path to ultimately losing the secret sauce of creativity and innovation. A country that steals IP rather than creating and respecting it will always lose the race.

© Hugh Stephens, 2025. All Rights Reserved

Copyright, Cultural Issues and Canada’s General Election, 2025

Image: Shutterstock (AI generated)

As we complete the first few days in what is the shortest election campaign in Canadian history, the minimum 37 days required by law, where do the copyright and cultural industries stand with respect to electoral platforms and public consciousness? Given the overwhelming focus on dealing with economic and even potential political disruption coming from south of the border, along with traditional bread and butter issues like the cost of living, especially food and housing, one could be tempted to say that cultural and copyright issues are largely invisible. Party platforms have not yet been released (and are probably still being worked on) and by the time they are made public, the election will be well underway. So while there still may be a couple of small references to copyright issues in party platforms (as occurred in the 2021 election, none of which led to any substantive legislation), they will simply be part of a laundry list of possible actions in many disparate areas. However, that has not stopped the cultural sector from outlining its policy proposals, which have been laid out articulately by the Coalition for the Diversity of Cultural Expressions (CDCE), an umbrella group that represents more than 350,000 creators and artists, and more than 3,000 cultural enterprises. Despite the fact that copyright issues are not at or even near the top of the agenda, there is a strong undercurrent of Canadian nationalism in this election that will inevitably have an influence on policies in the cultural sector.

In 2021 the governing Trudeau Liberals included a promise to “protect Canadian artists, creators and copyright holders by making changes to the Copyright Act including amending the Act to allow resale rights for artists”. They were re-elected but did nothing. The Conservatives for their part undertook “recognize and correct the adverse economic impact for creators and publishers from the uncompensated use of their works…”. They weren’t elected so the commitment was meaningless. This time proposed changes to copyright legislation are unlikely to move the needle for any party although the issue of the unauthorized use of copyrighted content to train AI still needs to be resolved, since AI will become a front-burner issue for any party elected. The CDCE’s paper addresses this issue, among others, in its 9 recommendations. Broken down into 4 buckets, the CDCE’s proposals address (1) International Trade and Cultural Sovereignty (2) Broadcasting and CBC/Radio Canada (3) Copyright and (4) Artificial Intelligence and Culture.

The CDCE proposal under “International Trade” is to insist that the cultural exemption clause be retained if the CUSMA/USMCA is renegotiated, and that cultural activities, goods and services be excluded from all future agreements. The cultural exemption clause, (Article 32.6 of the CUSMA) is based on a similar exemption in NAFTA and the original US-Canada bilateral trade agreement of 1989 but is more of a political fig-leaf than a real protection since if the provision is invoked, the US can retaliate with equivalent effect in any trade sector. However, it provided comfort to the cultural sector at a time when free trade with the US was seen to make Canada vulnerable culturally. Thirty plus years of bilateral, and now trilateral, trade proved that fear to be unfounded—until now—and the cultural exemption has never been used. During the period from 1989 to the present, even through the ups and downs of Trump 1.0, the fundamentals of the initial bilateral Free Trade Agreement, then NAFTA, and now the CUSMA/USMCA were basically respected by all parties. Under Trump 2.0 this has all been called into question. If the Trump Administration is going to disavow the basic elements of the CUSMA, having a cultural exemption clause becomes less than meaningless.

On April 2, the US will unveil its “reciprocal tariff” regime. It has arrogated to itself the right to include, in addition to tariffs imposed by other countries, self identified non-tariff measures in its calculations. Among these may be various cultural support measures imposed by Canada on foreign entities operating in Canada requiring them to make financial contributions to Canadian content. If that happens, the US will be violating yet again the provisions of the CUSMA/USMCA as it has already done with regard to the imposition of tariffs on some products on the specious grounds of fentanyl trafficking from Canada to the US, (less than 20kg in all of 2024). However, given the surge in Canadian nationalism as a result of the tariff threats but more particularly the verbal diarrhea coming daily from President Trump about Canada becoming the 51st state, it is unlikely that any Canadian government would throw Canada’s cultural identity under the bus for the sake of preserving tariff-free access to the US market for some commodities. Thus, seeing Canada sacrifice cultural support measures that may annoy some US businesses operating in Canada (like online streaming content providers) in return for a degree of tariff relief is an unlikely outcome in the present circumstances.

This surge of nationalism relates to the second of the CDCE’s “demands”, protecting the CBC and the Canadian broadcasting environment. Ever since Pierre Poilievre became leader of the opposition Conservative Party, one of the Party’s mantras has been “defund the CBC”. There is no question that the CBC business model is in need of reform, particularly its English language entertainment television service which captures a very small market share, but CBC radio, CBC news broadcasts and CBC’s French language service, Radio-Canada, remain highly relevant, as this CBC explainer attempts to show. Given the need to protect national identity in the face of the Trumpian onslaught, and the recent rediscovery that perhaps Canada is not so “broken” after all, if ever there was a need for this national institution, it is now.

The third basket of issues raised in the CDCE position paper relates to copyright concerns, which get very little traction among the general electorate but are important to the creative and cultural community. Once again, the CDCE reminds parties of the lack of an Artists Resale Right in Canada (noting previous promises to establish this measure), as well as some other longstanding issues like fair remuneration for writers and publishers for the use of their works in the education sector and extending the private copying regime to electronic devices. This would impose a small levy (about $3) paid by manufacturers and embedded in the cost of a smartphone to compensate for unregulated widespread copying of music on these devices, with the funds flowing back to music creators.

The final bucket deals with Artificial Intelligence (AI) and copyrighted content. At the present time there are some 40 lawsuits in the US pitting rightsholders against AI developers, and even a couple of cases in Canada. Canada has been slow off the mark in addressing this issue; at the moment there is no Text and Data Mining exception in Canadian copyright law and both rightsholders and AI developers are not clear on the ground rules. The CDCE is asking that a legislative framework be adopted that includes the key principles of (1) Authorization (by the rightsholder) (2) Remuneration (payment for use of copyrighted content) and (3) Transparency (the establishment of disclosure rules as to what training data is used in AI systems and ensuring that all AI-generated content is clearly identified). These are reasonable asks but there is no guarantee they will be respected.

In the US, AI developers are pushing the Trump Administration to give them a pass on respecting author’s copyright, notwithstanding the cases before the courts, using the argument that the US will lose the AI race to China if US developers cannot help themselves freely to the content of others. OpenAI (which is being sued by the New York Times) and Google argued in submissions to the US government that giving them unfettered access to data, including content owned by others, is essential for national security. Described by blogger David Newhoff as “tech bro bombast”, OpenAI’s attempt to wrap itself in the national security blanket is a cynical ploy to get around the inconvenient fact that it and other AI developers are hijacking the creative work of authors, artists, and musicians without permission or compensation while creating outputs that in a number of cases can compete with or even displace the original works that contributed to their training. A similar situation is developing in the UK where the creative community is pushing back against the original copyright carte blanche that the UK government seemed inclined to give to the tech community, in the name of AI competitiveness. Canadian governments are not beyond succumbing to the siren calls of the AI community and it is timely to establish some guiding principles, of which Authorization, Remuneration and Transparency are a good place to start.

However, while AI and copyright are not going to become election issues, national identity, which is closely intertwined with cultural sovereignty, surely is. Indirectly, copyright will be important as it is one of the foundation stones of cultural sovereignty, an issue that would have played second fiddle to economic issues like food inflation, carbon pricing, cost of housing, fuel and utility costs etc until Donald Trump started spouting his annexationist nonsense.

Frankly, had Trump really wanted to absorb Canada (eventually) he should have brought Canada inside the US economic tent and made the country even more reliant on the US market, by providing it with an exception to his attempts to take on the world trading system. Instead, he has woken Canadians from a restful, dependent slumber brought on by three decades of relatively uncontroversial free trade and economic integration and made them realize that they have no one to depend on but themselves. In doing so, he has revitalized a sense of nationalism that will play out in this election. Who can best defend Canadian interests has become the litmus test for Canadian voters, leading to a remarkable resurgence for the Liberal Party under new leader Mark Carney after the political corpse of Justin Trudeau was removed from the electoral scene. This may or may not change during the course of this short campaign. One thing is certain; while copyright issues per se will not get much profile, cultural identity issues will certainly be in the spotlight. This is a shift in emphasis that in the long run is likely to benefit the creative sector.

© Hugh Stephens, 2025. All Rights Reserved.

Using Copyrighted Content to Train AI: Can Licensing Bridge the Gap?

Image: Shutterstock

The struggle between authors (writers, artists, musicians) and AI developers over the unauthorized and uncompensated use of copyrighted works to train AI applications continues, both in the courts (here is a summary of the current state of play in the US where most of the litigation is taking place) and in the political arena, such as the UK government’s latest initiative to put its thumb on the scale in favour of the AI industry, now slowed down by opposition within Parliament. The creative industries in Britain are still nervous, however, as demonstrated by the coordinated “Make it Fair” campaign organized by leading UK newspapers on February 25. While the courts may provide some guidance, it is unlikely to be dispositive and is almost certainly to be somewhat contradictory and lengthy, given the appeal process that will play out. With new applications being rolled out every day, AI appears to be unstoppable. Let’s accept that this is the case. If so, what then will be the rules governing the use of AI training content, particularly content that is protected by copyright, such as books, journalistic output, paintings, musical compositions etc.?

It is already apparent that at least some of the AI output trained on these materials will compete in the marketplace with the original works. If that is the case, then surely some of that additional value should be shared with those who helped create the content initially. The way this will most likely be done is through licensing in the form of payment and permission for use of the copyrighted creative output that enabled the training to take place. Licensing would also help resolve another potential issue, the possibility that the final product produced by the AI algorithm infringes on the copyright of the works on which it was trained. This is unlikely to happen in the case of written works but is certainly potentially possible with graphic or musical works.

While many have called for licensing as a solution, there are many challenges to be overcome to make it work effectively. Yet some licensing is already taking place between AI developers and owners of well delineated data sets. As an example, various newspaper and magazine publishers have already reached licensing agreements with AI providers. OpenAI has signed licensing deals with the Wall Street Journal, Times of London, the Financial Times, Time, Le Monde, Axel Springer and others. This is in marked contrast to OpenAI’s relationship with the New York Times, which has led to one of the most prominent lawsuits in the field, with the Times suing OpenAI for copyright infringement. The reason for this lawsuit, of course, is because licensing negotiations between the two entities broke down. Some photo and image licensing companies have concluded AI deals (Shutterstock is the most prominent example) while others, such as Getty Images have not. (Getty is suing StabilityAI in the UK). Eventually most of the institutional or corporate holders of valuable content in one form or another will likely reach, or attempt to reach, licensing deals with the major AI developers. But that still leaves out an awful lot of copyright-protected content.

The conundrum is how to deal with the millions of individual creators who produce content in different formats, and tie them into a workable licensing regime. The first challenge is how to even figure out who is producing content that is likely to be used by AI developers. The second is to calculate how much that use is worth. Then there is the challenge of how to administer a collective licensing scheme in a way that is both practical and affordable and where the small amount of royalties for individual works are not swamped by the administrative costs of collection and disbursement. Finally, there is the question of how to resolve the issue of competing licensing organizations in order to provide more or less one-stop-shopping for the AI industry.

It is worth noting that one-stop-shopping currently does not exist in any area of collective licensing. Different collectives represent creators in different fields so music, publishing, art, broadcasting and visual arts licensing are all represented by different organizations, in some cases with more than one collective in a given field. The Copyright Board of Canada lists 36 copyright collectives on its website. I haven’t seen a definitive list for the US but this university website lists about the same number.

Whereas users of music only have to deal with a handful of CMOs (collective management organizations), and users of text based content (online or offline) need only to acquire a reprographic license from the major licensing collectives for published works, such as the Copyright Clearance Center in the US or Access Copyright in Canada, AI developers access the full gamut of content. It will be challenging to make access easy for the AI development industry, a point developed by Dr. Pamela Samuelson of the University of California, Berkeley, well-known copyright scholar (and skeptic, let it be added). In her recent paper in the UCLA Law Review (“Fair Use Defenses in Disruptive Technology Cases”), Samuelson focuses primarily on the question of fair use—as suggested by the title—but also examines the issue of a collective licensing regime for generative AI development. She manages to raise just about every objection conceivable (see pp.80-86 of the document for more details);

generative AI uses all forms of content therefore the licence would have to be very broad
-an issue would arise as to whether content for training was used just once, or on repeat occasions
-it would be very difficult and costly to administer given that there could be literally billions of creators involved

-creators would get very little revenue; the bulk would go to the administering agencies, the CMOs
-it would be difficult to determine value and to set a price on each transaction
-what about orphan works?
-differing national regimes might create confusion; alternatively some countries might not require a licence payment, giving them an unfair advantage
-it would be unfair to startups since the incumbents have already scooped volumes of content without payment.

She notes that creators may lose out, but since AI will affect the livelihoods of so many others, this is not exclusively a copyright problem. Tough luck creators.

Clearly Dr. Samuelson is not in favour of a collective licensing regime for content appropriated by AI developers, yet despite her firehose of cold water, there are a number of promising developments in this area. For example, the Copyright Clearance Center (CCC) in the US recently announced it would provide AI re-use rights within its Annual Copyright Licenses, making the CCC’s licence “the first-ever collective licensing solution for the internal use of copyrighted materials in AI systems.” Note the caveat. While covering re-use of content for AI applications, the CCC makes it clear that;

The license enables participating rightsholders to fulfill the needs of companies that require an efficient way to legally acquire the rights to use copyrighted materials within AI systems for internal use.”

Not training. The Copyright Agency in Australia has done something very similar.

Starting from February 2025, Copyright Agency will extend its Annual Business Licence to cover staff of licensed businesses who include third party material in prompts for AI tools (and) copy and share outputs from AI tools with colleagues”.

However, it does not apply to AI training and does not allow capture of the content outside the business, such as by an externally provided AI tool.

Likewise, the Copyright Licensing Agency (CLA) in the UK issues a Text and Data Mining (TDM) Licence. The CLA’s website explains that TDM “is the process of transforming unstructured content into a structured format to analyse, extract and identify meaningful information and insights. By using TDM, organisations can harness the power of vast volumes of information and data, capturing and revealing key concepts, trends, and hidden relationships.” Sounds quite a bit like training generative AI, but it’s not.

CLA’s TDM licence extension includes rights covering use of published content for TDM purposes. This does not cover the use of content in training or prompting Generative AI models.

Canada’s equivalent CMO, Access Copyright, is actively examining the issue, as it notes in its new strategic plan for 2025-2028;

Like collective rights management organizations around the world, we will actively explore how we might enhance our corporate licence offerings to include uses related to AI, providing Canadian rights holders who wish to participate in the emerging market for AI licensing to do so, either in Canada or by virtue of reciprocal agreements with sister organizations.”

It is clear that these Reproduction Rights Organizations (CMOs by another name) are cautiously feeling their way forward to find the appropriate role for collective licensing. Meanwhile the private sector has not been sitting idly by. Forbes reports that so many content aggregation startups have been established that they have formed a Data Providers Alliance. Recently launched “Created by Humans” is another commercial entrant that is pitching itself to authors.

Take control of your work’s AI Rights and get compensated for its use by AI companies.”

As these new enterprises enter the market, it threatens to become quite crowded. Just as there are more and more AI companies, including new entrants like DeepSeek, a proliferation of new sector-specific content-aggregators will make licensing more challenging. If the CMOs wait too long, they will face entrenched competition. Not all these new aggregators will survive. In the end, AI developers will not subscribe to multiple content licensors; they will go with the ones that provide the broadest coverage. It will be a Darwinian selection process.

While this is happening, other countries are experimenting with the concept of extended collective licensing for AI content. This allows CMOs to grant licenses on behalf of both their members and non-members alike. An extended collective licence is not a compulsory licence but it could lead to such a system being established. Spain was first out of the gate, but has since pulled back after the proposed Royal Decree attracted the criticism from many rights holders that it would proscribe their options. Yet it is one of many solutions being tested.

The recently released US Copyright Office report “Identifying the Economic Implications of Artificial Intelligence for Copyright Policy” includes an extensive discussion of licensing possibilities, including examining the pros and cons of a new statutory blanket licence. This would need to include a provision excluding rightsholders (such as entities that have already reached licensing agreements with AI developers) who have the ability and wish to issue voluntary licences that generate greater remuneration than a statutory payout would earn. This raises thorny opt-in/opt-out issues. Compromises will be required, but the challenges are not insurmountable.

The trick is to devise a system that will capture as much content as possible while allowing some flexibility to rightsholders, allocating payments in way that is fair and efficient (the USCO paper suggests that revenues associated with a work could serve as a rough proxy for its relative value), at the same time minimizing administrative costs so that expenses do not exceed potential revenues for rightsholders holding limited content inventory. Can it be done?

Despite the many obstacles identified by Dr. Samuelson and others, I am convinced that in the end collective licensing for content used in AI development and applications will become as accepted as the collective licensing regimes for use of various forms of copyrighted content today. The way forward won’t be straightforward; there will be zigs and zags. The courts and legislatures will play a role, as will authors, publishers, and the AI developers themselves. But in the end we will get there. Licensing, including some form of collective licensing, is the inevitable bridge that will bring AI developers and copyright holders together.

© Hugh Stephens, 2025. All Rights Reserved.

The Height of Hypocrisy! OpenAI Accuses DeepSeek of Stealing its Content


Image: Shutterstock (edited)

Am I the only one, or did anyone else have just a touch of schadenfreude when they read the story in the New York Times that OpenAI is claiming the Chinese start-up DeepSeek may have “improperly harvested” its data. What irony! DeepSeek caught everyone’s attention earlier this week when it announced a new AI application that appears to outperform or at least match OpenAI’s ChatGPT. Not only that, it is also open source and completely free to download and use. More important, its alleged development costs were but a fraction of the development cost of US models, reported to be in the hundreds of millions whereas DeepSeek claims that it produced its results with an investment of as little as $6 million. (This clearly does not include the value of earlier R&D, but the question is whether or not DeepSeek covered these costs).

We saw the shock this caused on the NASDAQ especially with respect to chip-designer Nvidia’s share price, with over $600 billion in value wiped off its valuation in one day. As often happens, there was a rebound the following day as saner heads digested the news and found a silver lining in the fact that AI development costs could be greatly reduced yet spending would continue. Of course, the spectre of “unfair” Chinese competition was raised, while others wondered how DeepSeek did it in the face of US high-tech embargos on the sale of advanced Nvidia chips to China. “They must have cheated” was the mantra.

It appears that part of DeepSeek’s success is based on what is called “distillation” in the AI industry. As explained in this tech article, distillation is a technique that “focuses on creating efficient models by transferring knowledge from large, complex models to smaller, deployable ones”. The earlier models do the heavy-lifting with respect to research and as they produce results, those results are incorporated into newer training models that take advantage of the earlier work. To my untrained mind, this sounds like building on knowledge created by others, as happens all the time or, to look at it negatively, by free riding on the investment of others. The question is, what knowledge is protectable and proprietary? This dichotomy is at the heart of the debate over copyright. You can’t copyright an idea, but the specific expression of an idea is protectable. Likewise, the functionality of software code cannot be copyrighted although a specific software program is considered a “literary work” and is protected.

There is also the issue of open source. Release of code as open source enables further advancements, pushing the boundaries of knowledge. This is a common feature of the digital revolution and one reason for rapid advancements in Silicon Valley. However, not all content is fully open source. In the case of OpenAI it would seem it considers its content to be proprietary to the extent that it can control the use to which it is put. The accusation is that DeepSeek took and distilled OpenAI’s results to create a competing application without permission. In effect, DeepSeek used ChatGPT to improve its own model.

OpenAI’s position that it can dictate the uses to which ChatGPT can be put is, in my view, contradictory, hypocritical and in the end morally if not legally indefensible. OpenAI has no problem enabling and encouraging people to use ChatGPT to “improve on” or create works in any field, from AI written novels to AI created art or music, resulting in works that directly compete with authors, artists and musicians. Remember that OpenAI has used their original copyrighted works without permission to build the AI machine that now threatens their livelihood and ability to create. Yet when that same AI application, ChatGPT, is used to improve on or create a new and better AI platform, this is declared to be infringement.

While distillation is common across the AI field, OpenAI claims its terms of service prohibit any use of data generated by its systems to build technologies that compete in the same market. This caveat would be similar to that which is applied to copyrighted content made publicly available on websites, with a disclaimer that it is copyright protected and potential users should contact the rightsholder. Did that stop OpenAI from helping itself without permission to this protected content to train its AI algorithm? Absolutely not. In fact, while it justified its activities by saying that all it was doing was taking “publicly available” content, not even paywalls and terms of service were allowed to get in their way. This was clearly demonstrated in the case brought against it by the New York Times. (When Giants Wrestle, the Earth Moves (NYT v OpenAI/Microsoft).

It seems that from OpenAI’s perspective, use of other people’s content without permission is okay, but when it’s their content, not so much. OpenAI is partially owned by Microsoft which is itself engaged in rolling out its own AI application, Copilot, trained in part through the unwitting contribution of hundreds of millions of users of Microsoft software, like MS-Word, as I wrote about last month. (Writers! Do You Know your Drafts on MS Word are being Scooped by Microsoft to Build its AI Algorithm? But You Can Stop This From Happening (Read On).

Given all that has transpired, and the struggle that authors and rightsholders are facing to protect and get paid for the use of their works in AI training, it is hard to have much, if any, sympathy for OpenAI. I certainly don’t. Poetic justice.

© Hugh Stephens, 2025. All Rights Reserved.

Writers! Do You Know your Drafts on MS Word are being Scooped by Microsoft to Build its AI Algorithm? But You Can Stop This From Happening (Read On).

Image: Shutterstock

Although I post my blog content on WordPress, I usually use MS Word to draft my content initially. I am used to it, and it is easy to use. Little did I know that, according to the blogsite and forum nixCraft, Microsoft recently (September Privacy update) switched on a feature that allows them to ingest everything you write on Word to help develop their AI Algorithm, called Copilot. The setting is turned on by default in the Privacy settings and must be unchecked manually. Did Microsoft tell you this? Well, kinda, sorta. Microsoft says, “we don’t use your customer data to train Copilot or its AI features unless you provide consent to do so”. Did you provide consent? You no doubt did, unknowingly, when Microsoft updated its Terms of Use, which it does on a regular basis. If you continued to use Office 365, you granted consent.

In the last few days, a pop up has appeared when I am doing something on Word through Office 365.

“Thank you for using Office! We’ve made some updates to the privacy settings to give you more control”.

If you believe that, I have a bridge to sell you.

If you go to Privacy Settings there is a summary blurb on how the Terms of Use were updated on September 30. If you look hard enough you will find this reference;

We added a section on AI services to set out certain restrictions, use of Your Content and requirements associated with the use of the AI services.”

You really should read the full Terms but as Microsoft notes, this will take an hour of your time (ESTIMATED READING TIME: 55 Minutes; 14268 words).

Having waded through it, this I believe is the relevant wording;

b. To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services.

There is nothing here about opting out. You have to go to Privacy settings and do some digging to get to that. By masking these changes to make them appear that your privacy has been strengthened (whereas in fact it is just a content grab), Microsoft has stood things on its head by putting the onus on you, the user, to exercise your privacy rights. If you don’t want your creative work used to help train its AI algorithm, which in the end might compete directly or indirectly with your work, you need to opt out (unless you want to stop using MS Word altogether). Microsoft, however, is not suggesting this as a preferred option, or even letting it be widely known that it exists as an option. In fact, when you go into Settings to opt out you are presented with this little gem; “The Trust Center contains security and privacy settings. These settings help keep your computer safe. We recommend that you do not change these settings”.

But that is exactly what you must do if you want to keep your creative content out of the hands of Microsoft’s AI developers. Here is how to do it, based on instructions from nixCraft.

On a Windows computer, when in a Word file, go to File in the top left-hand corner. There is a drop-down menu. You want to go to Options. On my computer the Options choice does not show up unless you hit the arrow at the bottom of the page for “More”. When you get to Options, go to Trust Center (left side menu), then Trust Centre Settings. Next up is Privacy Options which leads you to Privacy Settings. There is a drop down menu, including Connected Experiences. There is a heading labelled “Experiences that analyze your content“. This box is checked for you. You want to uncheck it. To save the setting you will have to log out of Word and then log back in. (Update: I have just discovered a quicker way to do this. Go File-Options-General (top of list)-Privacy Settings-Connected Experiences-Experiences that analyze your content-Uncheck).

Eliminating this option will come at a price, according to all the “Learn More” button provided by MS, but it is your choice. For my part, I will forgo the bells and whistles for privacy.

The opt out process is not simple and not intuitive, but worth doing, even if only as a matter of principle. Office365, unlike Google Search or Bing, is not free. We pay to use it through an annual subscription. Even the tired old argument that you are providing your data as a sort of payment for “free” use of a platform’s service does not apply in this case. Microsoft needs more data to feed its AI machine and yours will do just fine, thank you very much. Don’t let them get away with it.

© Hugh Stephens, 2025. All Rights Reserved.

Looking Back at 2024: It’s All About AI and Copyright (And a Few Other Things)

Image: Shutterstock

A retrospective on the year now coming to a close is what one expects this time of year, so I will try not to disappoint. However, when I look back at the copyright developments I wrote about in 2024, the dominant issues that jump out are AI, AI and AI. You can’t read or think about copyright without Artificial Intelligence, or to be more correct, Generative Artificial Intelligence (GAI), occupying most of the space despite many other issues on the copyright agenda. The mantra of “AI, AI and AI”, as in “Location, Location and Location” is apt because there are at least three important copyright dimensions related to AI; training of AI models; copyright protection for outputs generated by AI; and infringement of copyright by works created with or by AI. Of the three, the use of copyrighted content for AI training is the most salient.

Last year in my year-ender, I also discussed AI and the numerous lawsuits that were emerging as rightsholders pushed back on having their content vacuumed up by AI developers to train their algorithms. Those lawsuits have only multiplied. At last count, there are more that 30 cases in the US, ranging from big media vs big AI (New York Times v OpenAI/Microsoft) to class action suits brought by artists and authors, as well as litigation in the UK, EU, and now in Canada (see here and here). That is just on the input side.

In terms of output, i.e. whether works produced by an AI can be copyrighted, there are a couple of interesting cases in the US where applications for copyright registration have been refused by the US Copyright Office (USCO) because of a lack of human creativity. A couple of months ago, I discussed two such high profile cases, one brought by Stephen Thaler, and the other by Jason Allen. To date the USCO is not budging, although it is undertaking an extensive study of the issue. Part 1 of its study, on digital replicas, was published in July of this year. The next section on copyrightability is expected to be published in January with the issues of ingestion for training and licensing in Q1 2025.

While the USCO has to date denied applications for copyright registration of AI-generated works, the Canadian copyright office (CIPO-Canadian Intellectual Property Office) has been caught up in a problem of its own making. This is because Canadian copyright registration is granted automatically, so long as tombstone data and the prescribed fee is provided. The work for which registration is sought is not examined. As a result, copyright certificates have been issued to works created by AI, notwithstanding the general presumption that copyright protection is only accorded to human created work (although this is not explicitly stated in the Act). In July a legal challenge was launched against copyright registrant Ankit Sahni, who successfully registered a work with CIPO claiming an AI as co-author. The case was brought by the Canadian Internet Policy and Public Interest Clinic (CIPPIC) at the University of Ottawa, as I wrote about here. (Canadian Copyright Registration and AI-Created Works: It’s Time to Close the Loophole).

While the courts in the US, UK, Canada and elsewhere are grappling with various issues related to AI and copyright, governments are studying the issue.

In Australia, the Select Committee on Adopting Artificial Intelligence issued its final report in November. While the report was wide-ranging, three of its recommendations related to copyright;

engagement with the creative Industry to address unauthorized use of their works by AI developers and tech companies,

transparency in Training Data by requiring AI developers to disclose the use of copyrighted works in training datasets and ensure proper licensing and payment for these works, and

remuneration for AI Outputs, with an appropriate mechanism to be determined through further consultation

These are important principles, but how they will be implemented in practice remains to be determined.

In Canada, a consultation on AI and copyright was launched late in 2023 with submissions to be received by January 15, 2024. The Canadian cultural community put forth three key demands;

No weakening of copyright protection for works currently protected (i.e. no exception for text and data mining to use copyrighted works without authorization to train AI systems)

Copyright must continue to protect only works created by humans (AI generated works should not qualify)

AI developers should be required to be transparent and disclose what works have been ingested as part of the training process (transparency and disclosure).

Submissions to the consultation were published in mid-year but since then there has been no apparent action. Given the current political crisis facing the Trudeau government, none is expected in the near term although the issue will inevitably have to be addressed after the general election in 2025.

While the EU has already established some parameters dealing with use of copyrighted materials for AI training, the new UK Labour government is taking another run at the issue after various proposals in Britain to find a modus vivendi between the AI and content industries under the Tories went nowhere. The current UK discussion paper on Copyright and Artificial Intelligence, which seems excessively tilted in favour of the AI industry, has aroused plenty of controversy. While it says some of the right things, such as proclaiming that one of the objectives of the consultation is to “support…right holders’ control of their content and ability to be remunerated for its use” the thrust of the paper is to find ways to encourage the AI industry to undertake more research in the UK by establishing a more permissive regime with respect to use of copyrighted content. It is based on three self-declared principles; (notice how these things always seem to come in threes?);

Control: Right holders should have control over, and be able to license and seek remuneration for, the use of their content by AI models

Access: AI developers should be able to access and use large volumes of online content to train their models easily, lawfully and without infringing copyright, and

Transparency: The copyright framework should be clear and make sense to its users, with greater transparency about works used to train AI models, and their outputs.

These three objectives then lead to what is clearly the preferred solution;

“A data mining exception which allows right holders to reserve their rights, underpinned by supporting measures on transparency”

Fine in principle, but the devil is always in the detail and the details in this case revolve around transparency (how detailed, what form, what about content already taken?) and, in particular, reservation of rights, aka “opting out”. This is easy to proclaim in principle but difficult to do in practice. British creators are up in arms, led by artists such as Paul McCartney, and supported by the creative industries in the US. The British composer Ed Newton-Rex has penned a brilliant satire explaining how AI development in the UK will work if current proposal is enacted. The problem with an opt-out solution is essentially twofold; it doesn’t deal with content already absorbed by AI developers and it would be cumbersome if not impossible for many rightsholders to use.

Other governments have addressed the issue in different ways. Singapore has taken a very loose approach toward copyright protection, putting its thumb firmly on the scale in favour of AI developers. It is currently considering additional proposals that would strip even more protection from rights-holders, who are pushing back strongly. Japan had been widely and incorrectly reported to have been on the same path, resulting in a welcome clarification this year from the Agency for Cultural Affairs regarding the limits of Japan’s text and data mining (TDM) exception.

While AI dominated the copyright agenda in 2024, there were other issues relating to copyright and copyright industries that I wrote about. The ongoing question of payment for news content by large digital platforms continued to play out in different ways. In Canada, the struggle between the government and US tech giants Google and META was finally “resolved” (after a fashion) at the end of last year. Google agreed to “voluntarily” pay $100 million annually into a fund for Canadian journalism in return for being exempted from the Online News Act (ONA) while META called the government’s bluff by blocking Canadian news providers from its platform thus, in theory, avoiding being subject to the ONA. However, META has a very subjective interpretation as to what is Canadian news content, allowing some news providers to post to it, while many users have found workarounds, as documented by McGill’s Media Ecosystem Observatory. While the CRTC investigated, the issue is still unresolved.

Meanwhile in Australia, it seems that META intends to go down the same road of blocking news, announcing it will not renew the content deals it initially signed with Australian media in response to Australia’s News Media Bargaining Code, the model upon which Canada’s legislation was based. Unlike in Canada, the Australian government is planning a robust response. (More on this in a future blog post). Finally, on the same topic, California (which was threatening to introduce its own version of legislation to require digital platforms to compensate news content providers) emerged with an outcome very similar to that reached in Canada, with Google offering up some funding (although proportionally less than in Canada) while META appears to have walked away.

Controlled Digital Lending (CDL) was another copyright issue finally settled in 2024 (in the US). The Internet Archive, after losing a lawsuit brought against it by a consortium of publishers who argued that the digital copying of their works constituted copyright infringement, notwithstanding the Archive’s theory that they were simply lending a digital version of a legally obtained physical work held by them (or someone else associated with them), lost its appeal. In December, the deadline for further appeals expired, thus effectively ending this saga. Whether Canadian university libraries, some of whom are avid devotees of CDL, will take note remains to be seen.

The issue of circumventing a TPM (“Technological Protection Measure”), commonly referred to as a “digital lock” and often represented by a password allowing access to content behind a paywall, was also front and centre this year in Canada. In the case of Blacklock’s Reporter v Attorney General for Canada, the Federal Court found that an employee of Parks Canada, who shared a single subscription to Blacklock’s with a number of other employees by providing them with the password did not infringe Blacklock’s copyright since the employee did not circumvent (in the meaning of the law) the TPM and the purpose of the sharing was for “research“, which is a specified fair dealing purpose. Blacklock’s is a digital research service that sells access to its content and protects its content with a paywall, as is common for many online content providers, like magazines and newspapers.

Despite the hoo-ha of anti-copyright commentators asserting the Court had found that “digital lock rules do not trump fair dealing“, it was equally clear the Court had ruled that fair dealing does not trump digital locks (TPMs). The Court did not undermine the protection afforded to businesses to protect their content through use of TPMs. Rather, it determined that sharing a licitly obtained password did not constitute circumvention as outlined in the Act, as I explained here. (Fair Dealing, Passwords and Technological Protection Measures (TPMs) in Canada: Federal Court Confirms Fair Dealing Does Not Trump TPMs (Digital Lock Rules). Although the Court did not legitimize circumvention of a TPM for fair dealing purposes, contrary to claims stating the opposite, its acceptance of password sharing is an outcome that legal experts have disagreed with, (as do I for what it is worth). The law is very clear that fair dealing cannot be used as a pretext or a defence against violation of the anti-circumvention provisions of the Copyright Act. The decision now under appeal by Blacklock’s.

Finally, the last copyright point of note for 2024 is that this year marked the bicentenary of the introduction of the first copyright legislation in Canada, in the Assembly of Lower Canada, in 1824. It also marked the centenary of the entry in force of the first truly Canadian Copyright Act on January 1, 1924. This two hundred years of domestic copyright history is worth celebrating. The first legislation was introduced “for the Encouragement of Learning” so that more local school texts would be written and printed. Given the current standoff between the secondary and post-secondary educational establishment and Canadian authors and their copyright collective over license payments for use of copyrighted works in teaching, one wonders whether we have really learned anything about the role copyright plays in our society. (Copyright and Education in Canada: Have We Learned Nothing in the Past Two Centuries? (From the “Encouragement of Learning” to the “Great Education Free Ride”).

Leaving that question with you to ponder, gentle Reader, is probably a good way to end this look back over the past 12 months. Stay tuned for more commentary on copyright developments in 2025.

© Hugh Stephens, 2024. All Rights Reserved.

CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics (But Don’t Panic, Canada)

Image: Pixabay

Last month I highlighted the first AI/Copyright case in Canada to reach the courts, CanLII v CasewayAI. CanLII, (the Canadian Legal Information Institute), a non-profit established in 2001 by the Federation of Law Societies of Canada, sued Caseway AI, a self-described AI-driven legal research service, for copyright infringement and for violating CanLII’s Terms of Use through a massive downloading of 3.5 million files which Caseway allegedly used to populate its AI based services. Now the principal of CasewayAI, Alistair Vigier, through an article (Don’t Scare AI Companies Away, Canada – They’re Building the Future) published in Techcouver, has responded publicly by trotting out many of the tired and specious arguments put forward by the AI industry to justify the unauthorized “taking” of copyrighted content to use in or to train generative AI models. Let’s have a closer look at these arguments.

Vigier opens by referencing another AI/Copyright case in Canada where a consortium of Canadian media companies is suing OpenAI for copyright infringement. He claims this is all based on a misunderstanding of how AI training works, stating that “AI systems like OpenAI rely on publicly available data to learn and improve. This does not equate to stealing content.” Whether data is “publicly available” or not is irrelevant when it comes to determining whether copyright infringement (aka stealing content) is concerned. Books in libraries are publicly available, or so is a book that you purchase in a bookstore, or content on the internet that is not behind a paywall. (It is worth noting that the Canadian media companies also claim that OpenAI circumvented their paywalls to access their content when copying it). But in none of these cases is copying permitted unless the copying falls within a fair dealing exception, which is very precise in its definition. Labelling copied material as “publicly available” is a red herring.

Vigier’s next argument is to equate the ingestion of content by various AI development models with a human being reading a book. We know that humans enhance their knowledge through reading and are thus able, presumably, to better reason based on the content they have absorbed. Vigier says, “This is how AI works. The AI “reads” as much as it can, gets really “smart,” and then explains what it knows when you ask it a question. Like a human learns from reading the news, so does an AI.

Really? A human does not make a copy, not even a temporary copy, of the content although some elements of the content are no doubt retained in the human brain. But AI operates differently. It makes a copy of the content. This should be beyond dispute although the AI industry continues to muddy the waters by claiming that when content is “ingested” it is converted to numeric data and is thus not actually copied. This is a fallacious argument. Just because the form changes, this does not mean there is no reproduction. When you make a digital copy of a book, there is still reproduction even though the digital form is different from the original hard copy version. When a work is converted to data, the content is still represented in the dataset.

Vigier dubiously states, with regard to OpenAI, “OpenAI’s models do not reproduce articles verbatim; they process vast datasets to identify patterns, enabling insights and efficiency.” Apart from the fact that the New York Times in its separate lawsuit in the US has been able to demonstrate that by typing in leads of articles, it can prompt OpenAI to reproduce verbatim the rest of the article (OpenAI claimed that the Times “tricked” the algorithm), copying is copying even if the result of the copying is somewhat different from the original. The Copyright Act is crystal clear on this point. Section 3 (1) of the Act states that, “For the purposes of this Act, copyright, in relation to a work, means the sole right to produce or reproduce the work or any substantial part thereof in any material form whatever…. If copyright protected content is reproduced in its entirety without permission for a commercial purpose (eg for AI training), that is infringement, unless the use qualifies as a fair dealing under Canadian law or fair use in the US.

The issue of whether ingestion of content to train an AI application results in copying (reproduction) has been carefully studied and documented. One of the most thorough examples is a recent SSRN (Social Science Research Network) paper, entitled, “The Heart of the Matter: Copyright, AI Training, and LLMs” with noted scholar Daniel Gervais (a Canadian by the way) of Vanderbilt University as lead author. The article goes into a detailed discussion on how copying of content occurs during AI scraping to build a Large Language Model (LLM), including the stages of tokenization, embedding, leading to reward modelling and reinforcement learning. The section of the article explaining how copying occurs (pp. 1-6) is dense, technical text but the conclusion is clear, “LLMs make copies of the documents on which they are trained, and this copying takes various forms, and as a result, with appropriate prompting, applications that use the LLMs are able to reproduce original works.” A shorter (and earlier) version explaining how the LLM copyright process works can be found in this article (“Heart of the Matter: Demystifying Copying in the Training of LLMs“), produced by the Copyright Clearance Center in the US. It is also worth noting that these explanations refer only to ingestion of text. AI models that train on images and music are even more likely to produce exact or close-to-exact reproductions of some of the works they have been built and trained on.

So much for the misinformation in Vigier’s article. Now to the scare tactics. He says that the recent Canadian media lawsuit against OpenAI sends a negative message to innovators that Canada may not be open to AI development.

If Canada wishes to remain relevant in this (AI) sector, it must balance protecting intellectual property and promoting technological progress.

The fact that there are currently more than 30 lawsuits in the US, including the seminal New York Times v OpenAI case, does not seem to have slowed down the AI companies in the US. In the UK, legislation has been introduced that would, according to British media reports, “ensure that operators of web crawlers (internet bots that copy content to train GAI, generative AI) and GAI firms themselves comply with existing UK copyright law. These amendments would provide creators with crucial transparency regarding how their content is copied and used, ensuring tech firms are held to account in cases of copyright infringement.” There is lots of AI innovation ongoing in Britain.

The Australian Senate Select Committee Report on Adopting AI has recommended, among other findings, that there be mandatory transparency requirements and compensation mechanisms for rightsholders. The EU is already way out in front on this issue. Its new AI Act stipulates that providers of AI generative models will be required to provide a detailed summary of content used for training in a way that allows rightsholders to exercise and enforce their rights under EU law. Even India now has its own version of the US and Canadian media cases against OpenAI. (OpenAI’s defence in part is based on the argument that no copying took place in India because no OpenAI servers are located there!)

If that is what the “competition” is doing, who does Vigier cite as being the jurisdictions most likely to attract innovators away from Canada? Why, it is those AI powerhouses of Switzerland, Dubai—and the Bahamas!

The argument that if legislators and the courts don’t give AI innovators a free pass on helping themselves to copyrighted content for AI training purposes, this will either slow down innovation or chase it elsewhere is a common fearmongering strategy of the AI industry. This is a race-to-the-bottom mentality whereby content industries are thrown under the AI bus. Vigier, having been the subject of his own lawsuit, argues that instead of resorting to litigation, the Canadian media companies should have sought a licensing solution. But the fact that no licensing agreement was reached with OpenAI is undoubtedly the reason for the lawsuit in the first place. That is certainly the reason behind the NYT v OpenAI lawsuit in the US; licensing negotiations broke down. If someone has taken your content without authorization, and then offers you pennies on the dollar in comparison to what that content is actually worth, then the stage for a lawsuit is set.

In explaining CasewayAI’s position in the litigation brought by CanLII, Vigier says that Caseway approached CanLII with an offer to collaborate but was rebuffed. As a result they developed other extensive web crawling technology that pulled the needed material from elsewhere. (Where exactly the material was downloaded from is the crux of the matter). Regardless, this makes it sound as if it was CanLII’s fault for refusing to share their content. Surely a rightsholder has the right to determine the terms on which their content is to be shared with others, if at all.

The fact that Caseway went to CanLII in the first place suggests that CanLII had developed the content that Caseway wanted. Caseway claims the material it accessed was on the public record, such as court documents and decisions. CanLII, on the other hand, claims that it had reviewed, indexed, analyzed, curated and otherwise enhanced the content in question, thus adding a wrapping of copyright protection to what otherwise would be public documents. Who is right, and whether the material was scraped from CanLII’s website without authorization, will be determined by the BC Supreme Court.

If the material taken by CasewayAI was not copyright protected, they are in the clear, at least with respect to copyright infringement. That is quite different, however, from arguing that no copying takes place during AI training or that if rightsholders use the courts to protect their rights, Canada will be a laggard when it comes to AI development. Robust AI development needs to go hand in hand with robust copyright protection for creators, with an appropriate sharing of the spoils of the new wealth generated from the creative work of authors, artists, musicians and other rightsholders. To say, as Vigier does in his concluding paragraph that;

Canada has a choice to make. Will we embrace AI as the transformative force it is, or will we let fear and litigation stifle innovation? The lawsuits against Caseway and OpenAI message tech companies: you’re not welcome here. If this continues, Canada won’t just lose its AI startups; it will lose the future of job creation.

What sheer self-interested nonsense!. This is fearmongering of the worst kind, based on an inaccurate and misinformed knowledge of how AI is developed and trained, that moreover impugns the legitimate right of a rightsholder to seek the protection of the law to protect their creativity and investment in content. Vigier might be correct when he says that licensing of content is a win/win for both parties. I agree with that. But licensing negotiations are about money and conditions of use and require willing parties on both sides. When licensing discussions break down, or when one party decides to do an end run on licensing because they have been rebuffed, then the way to gain clarity is through the courts whose job it is to interpret what the legislation means.

Canada still needs to come to grips with the question of how copyrighted content will interface with AI development. As I noted earlier, both sides in the debate made their cases in the public consultation launched a year ago, but since then there has been no movement in Ottawa. The law could be strengthened to ensure adequate protection of rightsholder interests in an age of AI, resulting in facilitating licensing solutions. In the meantime, misinformation and scare tactics need to be called out for what they are.

Adequate protection for rightsholders does not mean the end of AI innovation or investment in Canada. There is no need for panic. We can walk and chew gum at the same time.

© Hugh Stephens, 2024. All Rights Reserved.

AI-Scraping Copyright Litigation Comes to Canada (CANLII v Caseway AI)

Image: Shutterstock (with AI assist)

It was inevitable. After all the lawsuits in the US (and some in the UK) pitting various copyright holders against AI development companies alleging the AI platforms were infringing copyright by reproducing and ingesting copyrighted materials without authorization to train their algorithms to produce outputs based on the ingested content–outputs that in some cases compete directly with the original work—AI scraping litigation has finally come to Canada. As reported by the CBC, CanLII (the Canadian Legal Information Institute), a non-profit established in 2001 by the Federation of Law Societies of Canada “to provide efficient and open online access to judicial decisions and legislative documents” is suing Caseway AI, a self described AI-driven legal research service, for copyright infringement and for violating CanLII’s Terms of Use through a massive downloading of 3.5 million files.

In its civil claim brought before the Supreme Court of British Columbia, CanLII alleges that the defendants, doing business as Caseway AI, violated its Terms of Use that prohibit bulk or systematic download of CanLII material and that in doing so, the defendants also engaged in copyright infringement by reproducing, publishing and creating a derivative work based on the copied works for the defendants own commercial purposes. There is no question that Caseway is providing legal material for commercial gain. Caseway’s services start at $49.99 a month , or $499.99 a year, and offer an AI driven service that “leverages advanced AI to find relevant case law in less than a minute… Designed with a user-friendly chatbot interface powered by proprietary technology, Caseway (is) a robust tool tailored specifically for the legal profession.” Caseway’s Terms of Service have all sorts of disclaimers, however.

In his defence, Caseway’s Canadian principal (and defendant) Alistair Vigier is reported to have said that “court documents are public record, not owned by any organization, including CanLII. Numerous other websites also make these decisions available.” It is true that court documents and decisions are public documents not subject to copyright protection. However, CanLII claims that its database contains more than just the court’s decisions. It says in its claim that it spends significant time to “review, analyze, curate, aggregate, catalogue, annotate, index and otherwise enhance the data” prior to publication. It is this creative effort that turns public documents into a copyright protected document (or so the argument goes). To use another copyright analogy, you cannot copyright a recipe (a “list of ingredients”) but we all know that cookbooks containing recipes are always copyrighted. This is because of the display and illustrations of the recipes, the layout, commentary and other editorial touches. Julia Child’s sole amandine recipe is not just any old recipe for fried sole. Is CanLII’s compilation of “judicial ingredients” protectable? We will have to wait to find out.

CanLII’s case is reminiscent of a similar case in the US, Thomson Reuters v Ross Intelligence. Thomson Reuters operates a subscription-based legal research service called Westlaw. One of Westlaw’s employees allegedly copied Westlaw content to enable Ross Intelligence to build a machine learning platform that competed with Westlaw. Part of Ross’ defence was that the judicial decisions themselves are public domain documents, so there could be no infringement. Westlaw maintained that its case head notes, summaries that described the cases, were copyrightable material. Ross also brought forward a fair use defence arguing transformation, i.e. they had produced something new and different that did not compete directly with Westlaw’s product. Here is a good summary of the case. The court determined that Ross had copied the headnotes but the copyrightability of Westlaw’s numbering system and headnotes needed to go to a jury to determine. While Ross’ anti-trust case against Westlaw has been dismissed, the copyright case is still pending.

Another case that has been cited as a possible precedent is the famous 2004 CCH Canadian Ltd v Law Society of Upper Canada case in which the Supreme Court of Canada ruled that copies of CCH materials made by the Law Society library for its members did not infringe CCH’s copyright because the library was exercising the fair dealing research exception on behalf of the individuals requesting the copies. I personally don’t see the relevance of this case (but I am not a lawyer) since the Great Library’s users were copying only relevant parts of certain documents, for a specified fair dealing purpose. In the CanLII case, Caseway has apparently inhaled the full collection of documents and is doing so for a commercial purpose, with the resultant product (although not identical to the original) competing with it. Moreover, since there is no text and data mining exception in Canadian law, the “transformation” defences available to US-based AI companies (i.e transforming the original materials to produce something different) are not applicable in Canada. This will be an interesting one for the lawyers.

What the case demonstrates is a crying need for some legislative guidance on the question of AI scraping of copyrighted materials in Canada. It may be that CanLII’s collection cannot be protected by copyright, which would provide Caseway a defence without settling the fundamental issue of whether it is a violation of the Copyright Act to do what Caseway did, assuming the material they used was protectable by copyright. A consultation exercise was launched by the government of Canada (through the Ministry of Innovation, Science and Economic Development, ISED) last October, closing in January with submissions posted in June. Since then, there has been silence on the part of the government. With Parliament at a standstill, and the current government hanging on to power by its fingernails, don’t expect clarity any time soon.

© Hugh Stephens, 2024. All Rights Reserved