TDM – Hugh Stephens Blog

AI Training and Copyright: Australia Gets it Right—Now it’s Canada’s Turn

Flags of Australia and Canada displayed side by side, showcasing their national colors and symbols.

Image: Shutterstock

In early June Canada issued its national AI strategy paper, “AI for All”. As I noted in a blog post at the time, while the strategy covered many elements of AI in its 50 pages outlining policy objectives and planned actions, it managed to avoid using the word “copyright” even once. Australia has just come out with its own updated AI policy statement “AI in Australia’s interest”, which builds on its own “National AI Plan”, released last December. But whereas the Carney government in its AI strategy managed to completely avoid putting copyright into the AI equation, Prime Minister Albanese, after discussing the importance of developing AI for Australia, had this to say;

“But let me make this crystal clear: not everything produced in Australia is up for grabs.

Not at all.

Australian writers, musicians, artists and journalists must retain ownership and control of their work.

Our laws will spell that out, plain as day.

An artist’s creative endeavour is their work and their property.

No company should use Australian books, music, art or news to build or train AI without the artist’s control.

That includes the artist’s control of the price and value of their work.

Anything less, is theft.”

Blunt, clear and refreshing. If Australia can protect its cultural community while promoting policies for sensible AI adoption and development, then so can Canada.

Both Canada and Australia currently have no Text and Data Mining (TDM) exception in their copyright law. This legal loophole would allow AI developers to appropriate content without permission for training purposes. In both countries there have been calls from the tech community to introduce a TDM exception, a carte blanche that would allow AI companies to ingest copyrighted content without authorization, payment or even acknowledgement. In its December “National AI Plan”, which is much more analogous to Canada’s “AI for All” than Albanese’s recent short AI policy statement–in that it outlined a range of detailed policy proposals for AI adoption in Australia– the Australian government nonetheless managed to grasp the copyright nettle unambiguously.

Among the issues highlighted under “AI Risks and Harms” was the following:

“Reviewing application of copyright law in AI contexts: The Attorney-General’s Department is engaging with stakeholders through the Copyright and AI Reference Group to consult on possible updates to Australia’s copyright laws as they relate to AI. The government has provided certainty to Australian creators and media workers by ruling out a text and data mining exception in Australian copyright law” (emphasis added)

Just as the Australian government has sensibly ruled out a TDM option. Canada needs to do the same, as called for Canadian cultural umbrella groups, such as the Coalition for Diversity of Cultural Expression (CDCE).

So far Canada has danced around the issue. Heritage and Identity Minister Marc Miller has said that “the current copyright law does and should protect those that have created material, and people need to be compensated properly”, but he is just one minister among several. Evan Solomon, Minister of Artificial Intelligence and Digital Innovation, and Minister of Industry Melanie Joly, both have a big piece of this file. One can expect that both can be counted on to be more sympathetic to tech bros than cultural mavens. What is needed is a prime ministerial pronouncement clarifying that Canada’s creative community–artists, writers, publishers, musicians, filmmakers, photographers, journalists and more– is not going to be thrown under the bus on the pretence of keeping Canada competitive in the global AI game.

In the wake of Australia’s announcement that a TDM exception was off the table, the tech industry tried a new approach by suggesting the creation of a centralized fund that would be used to compensate rightsholders for the permissionless use of their works in AI training. Specifically, AI company Anthropic reportedly tied a proposed $15 billion USD ($21.6 billion AUD) investment in data centres in Australia to creation of the creatives fund in order to allow to access Australian content without licensing or negotiation with rightsholders. Australia’s creative community quickly mobilized. Their concerns were heard. Along with setting clear guardrails ruling out the unauthorized use of copyrighted creative works, Albanese has created a new Office of AI within the Prime Minister’s Office, recognizing the need for policy coordination given the breadth of AI’s policy impact. This is something that Canada might consider. It has Evan Solomon, Minister of Artificial Intelligence and Digital Innovation, but there seem to be very few cultural community voices within Solomon’s hearing range.

Australia has the same goal as Canada of getting its fair share of the AI pie while managing AI adoption and its impact on society. But there is one big difference. In so doing, the Australian government has made it clear it will pursue its AI goals while simultaneously respecting and protecting its culture and its creators. Canada’s cultural and creative community deserves no less consideration.

Like Wasps at a Picnic: (Distracting from the Canadian Heritage Committee Report on AI and Creative Industries)

Close-up of a wasp drinking from a metallic surface with blurred green background.

Image: Pixabay.com

It was as predictable as wasps at a picnic. Within days of the Canadian Parliament’s Heritage Committee releasing its report on “The Impact of Artificial Intelligence on the Creative Industries”, with its lead recommendation being (my highlights)…

That the Government of Canada protect the property rights and interests of artists through the principles of the Copyright Act, in accordance with the ART principle—authorization, remuneration and transparency:

a) The Government of Canada must take the necessary steps and ensure that the scope of the Copyright Act applies to AI-generated content in order to guarantee copyright protection.

b) The Government of Canada must mandate greater transparency from AI developers regarding copyrighted works used to train their models, including disclosure of training data sources, to enable proper authorization and licensing.

c) The Government of Canada must establish a clear opt-in consent requirement for the use of copyrighted works in the training of artificial intelligence systems, ensuring that creators’ works may not be used for text and data mining or model development without their prior authorization.

…prolific tech and copyright commentator Michael Geist of the University of Ottawa was attacking its conclusions, issuing warnings that unless the tech industry is allowed (without authorization or compensation from rightsholders) to help itself to copyrighted content for the purpose of AI training, we will have “AI without Canada”. In other words, unless the tech industry is allowed to plunder Canadian content in the same way that it has been doing to date in the US (although this is meeting legal challenges and is quickly changing as licensing solutions take hold), there will be less Canadian content in the training data. This, apparently, will leave Canada as an “outlier” compared to peer jurisdictions. The AI developers will turn their back on Canada and rush off elsewhere. (This is a standard threat deployed by the AI industry to play off one country against another). He cites the EU, Japan, Singapore and Israel, as well as the US in support of this interpretation. Not mentioned as “peer jurisdictions” are the UK and Australia but then that would not have served the purpose of his narrative. Australia has recently declared it will not be legislating a Text and Data Mining (TDM) exception to its copyright laws to legalize unauthorized ingestion of copyrighted works for AI training, while the UK has just hit the pause button on a series of ill thought-out and badly received proposals to allow AI developers to freely use copyrighted content to train their AI algorithms unless rightsholders specifically opt out.

Singapore and Israel are among a small minority of countries that, under US pressure, have adopted US-style fair use laws that potentially allow for a weakening of copyright protection through a hodge-podge of court rulings. While many cite Japan as a jurisdiction that has given carte blanche to tech interests and AI developers, the facts are quite different as I pointed out in this blog post a couple of years ago. Japan has a strong cultural industry that it wants to nourish and protect and has defined its TDM exception very narrowly and carefully. The EU, has two provisions in its Copyright Directive related to AI training (Article 3 which permits TDM carried out only for non-commercial scientific research purposes, and Article 4, which permits TDM for any purpose, including commercial, as long as rightsholders have not opted-out, subject to strict transparency provisions by AI companies). Both impose constraints on AI developers, although there are differing views on opt-out.

Opting-out may sound like a compromise that both rightsholders and the AI industry could support but Britain’s example demonstrates otherwise. In its now aborted public consultation, the UK government put forward several options including its “preferred” option of opt-out. Fully 97 percent of respondents, from both the tech and creative communities, trashed this option. For creators, opting out not only stands copyright on its head (it is a property right, so why should holders of that right be required to notify someone who wants to infringe on that right that they may not do so, i.e. it’s like passing a law allowing anyone to picnic on my front lawn unless I post a “No Trespassing” sign), but it is technically difficult to do, especially for individuals and small-scale rightsholders. The robots.txt protocol is not binding and is in many cases not very effective. The tech industry doesn’t like opt-out because it imposes constraints on their untrammelled ability to access anyone’s copyright-protected content, anywhere, anytime. Instead the Committee recommends “a clear opt-in consent requirement” for the use of copyrighted works in the training of artificial intelligence systems.

Now it’s my turn to quibble. IMHO, there should be no explicit need for a rightsholder to “opt in”. I think that Canada’s copyright laws, properly interpreted, already provide sufficient protection to prevent unauthorized use. A rightsholder can “opt in” to AI training or any other unauthorized use not subject to fair dealing by granting a license to use their content. If that is an “opt-in” requirement then I am in favour. If yet another opt-in step is required, this would seem to be unnecessary. Licensing is a growing phenomenon. AI developers want reliable, curated content to develop their applications. As long as they are prevented from simply helping themselves, there is incentive for them to reach licensing deals with content owners. However, giving the tech industry a pass by allowing themselves to take for free whatever they want in the name of developing AI applications (for their commercial advantage) removes the needed incentive to negotiate with rightsholders. As to whether unauthorized use for AI training constitutes fair dealing, as Dr. Geist claims (“most TDM for AI training purposes would likely qualify as fair dealing under existing law”), this is doubtful to say the least. It is hard to imagine which fair dealing purpose currently applicable in Canadian law (research, private study, education, parody or satire, criticism or review, news reporting) would apply particularly when there are fair dealing limits to the amount of a work that can be used for such purposes, and specific factors that must be applied as to the effect of the dealing on the work.

The Committee’s lead recommendation is not the only complaint that Dr. Geist has about the Committee’s report. He feels it is unbalanced because the majority of its witnesses represented the cultural industries. It’s true that its lead recommendation is very much in line with the mainstream views of the Canadian cultural community. It was, after all, the Report of the Standing Committee on Canadian Heritage. This reminds me of the conflicting reports on copyright issued a few years ago by the Heritage Committee and its counterpart the INDU Committee. The 2019 Heritage Committee report, titled Shifting Paradigms, was attacked at the time by Dr. Geist as “the most one-sided Canadian copyright report issued in the past 15 years”. He claimed that there was “no attempt to engage with a broad range of stakeholders”, even though he himself appeared along with a number of others who shared his perspective on copyright. Shortly after issuing its own report, the INDU committee then issued a tone-deaf “We’re in charge” press release reminding the world that it had “sole responsibility” for administering the Copyright Act. (This is not strictly accurate). Dr. Geist’s main complaint, whether with “Shifting Paradigms” in 2019 or the current Heritage Committee report seems to be that the Committee members, in their wisdom, did not take his expert advice.

What is the function of Parliamentary Committees? It is to hear evidence, draw conclusions and make recommendations. He complains that while there were different points of view, including notably his, on how to tackle the issue under study, the Committee’s conclusions did not reflect these views. Was it because, numerically, there were more pro-copyright witnesses from the creative community that those from the Geist camp? That is theoretically possible if it were just a mathematical exercise of adding up comments in a pro and con column. But that is not the case. While the Report made a conscientious effort to capture the full range of comments, including those of Dr. Geist, in the end the members (from three political parties) made a judgement and reached consensus conclusions. (Although the Conservative Party members provided their own addendum that added to but did not refute the Committee’s conclusions). Presumably the members of the Committee were more convinced by the force of the arguments presented by some witnesses than others. Given the range and similarity of concerns presented by disparate members of the creative community it is not surprising where they came out in terms of conclusions.

Dr. Geist is entitled to disagree with these conclusions and recommendations. To be fair, his blog commentary echoes the position he presented to the Committee, except for his complaints about process. As I said at the outset, his attack on the Committee’s report is entirely predictable, like wasps at a picnic. And those wasps can be so annoying, distracting from the main event with the occasional bite and annoying buzzing, but as any determined picnic-goer knows, it’s important to not let them become the centre of attention. The Heritage Committee’s report was carefully considered and drafted by an all-party group after hearing from a wide range of experts. It provides important recommendations that the government would be well advised to take into account as it develops a legal framework in which both the AI and creative industries can co-exist and flourish.

We need more Canada in the Training Data, but through Licensing not Loopholes

Canada Has a Choice When it Comes to AI Training Content

Scrabble tiles arranged to display the words 'LOOPHOLES' and 'LICENSING' on a game board.

Michael Geist, Canada Research Chair in Internet and E-Commerce Law at the University of Ottawa has argued, in an appearance before the Heritage Committee of the House of Commons, that “we need more Canada in the training data”. He is absolutely right, but just not in the way he proposes. Dr. Geist is what I would call a well-known skeptic when it comes to the intrinsic value of copyright, a copyright “minimalist” if you will (probably an understatement).

With respect to the unauthorized and uncompensated use of copyrighted content for AI training, he states that “in the context of AI, the application of copyright isn’t clear cut. The outputs of AI systems rarely rise (to) the level of actual infringement given that the expression may be similar or inspired by another source, but it is not a direct copy of the original.” Whether the outputs mirror the inputs is not the sole issue. In some cases, such as when music and images have provided the inputs, they do. This is an infringement of the reproduction right, and likely also an infringement of the distribution right and the right to produce a derivative copy (under US law). In Canada the right to create another work from an original work comes from the right of adaptation. However, even without a mirrored output, full reproduction still takes place at the input stage, creating an infringement unless the copies meet a fair dealing purpose and fulfill fair dealing criteria, even if the copies are later deleted. As Keith Kupferschmid, CEO of the Washington DC based Copyright Alliance has pointed out in a recent blog post discussing the copyright principles that apply in AI training cases,

“Some people mistakenly believe that in order to establish an infringement during the input stage, the copyright owner needs to establish substantial similarity between the ingested copyrighted work and AI-generated output and if no substantial similarity exists there is no infringement in this stage. That is incorrect.”

Even without mirrored outputs, full non-transitory copies of copyrighted works are being made at the ingestion stage of AI training. That is an infringement, just as making a photocopy of a complete work, such as a book, would be an infringement unless covered by an explicit exception such as preservation purposes by a library or archive.

Dr. Geist’s second line of argument is that if Canada makes it more difficult or costly to develop large language models, AI development will shift outside the country. This is a tried-and-true but tired pretext frequently employed by those seeking to justify the appropriation of copyright protected content in the name of “innovation”, as I pointed out in an earlier blog post. (CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics -But Don’t Panic, Canada). This is a race to the bottom, throwing the content industry under the bus on the pretext that everyone is doing it, even though that is untrue. One provision that has been selectively incorporated into the laws of some jurisdictions, like the UK and the EU, is an exception for “text and data mining” (TDM). Dr. Geist states this is why Canada also needs to introduce a similar statutory exception to promote AI.

However, not everyone is engaged in this race to the bottom. In fact, there are increasing doubts that establishing a statutory TDM exception for AI training is the best way to go. Australia has just firmly rejected the creation of a TDM exception in its copyright law even though it is also grappling with the same issue of how to incentivize AI training and research in that country. The UK’s current TDM exception is limited to non-commercial research purposes and in the face of strong opposition from its creative sector, Britain has put proposals to expand TDM on hold. Even the EU’s TDM law, which has two aspects, one limiting the data mining to non-commercial scientific research conducted by scientific research organizations or cultural heritage institutions while the other is a general purpose TDM that is open to commercial organizations, has guardrails. These include an opt-out provision whereby rightsholders can block ingestion of their content through technical measures, contract provisions or other means, in which case the TDM exception does not apply.

While opting-out by rightsholders is one way to limit the damage of unrestricted text and data mining, this is controversial because it places the onus on the rightsholder to take action whereas normally a party wanting to use someone else’s property would have to obtain permission in advance. Opting out is not a preferred solution for the creative community. It doesn’t work well in practice as rightsholders often lack the technical means or awareness to apply their opt-out rights. Because of this, the European Parliament’s Committee on Legal Affairs has just published a study examining how generative artificial intelligence interacts with European Union copyright law. The study recommends moving from opt-out to opt-in for rightsholders.

Thus, far from TDM being or becoming the norm, it is being rejected or constrained in a number of countries where the AI industry has been pushing it as the ultimate solution. The Canadian creative community, like the creative sector in Australia, has spoken out strongly against introducing a TDM exception into Canadian law. Indeed, there is no need to do so as licensing solutions allowing AI training and text and data mining are becoming more and more common, including in Canada. For example, the Writers Union of Canada is studying a proposed agreement between select nonfiction authors, HarperCollins, and Microsoft to license full texts for the purpose of training artificial intelligence. Licensing agreements have taken off big-time in the US and elsewhere as the AI industry begins to understand this is the safest way to protect their investments. Canadian creators risk being left by the roadside if Canada brings in a TDM exception that would allow AI developers to steam ahead, appropriating content without payment or permission and ignoring licensing requirements by hiding behind a TDM exception. The surest way to kill a nascent and growing licensing market is to give the AI sector a TDM loophole to exploit, removing any incentive to reach licensing agreements with rightsholders. The solution is licensing, not loopholes.

Dr. Geist stated in his testimony to the Heritage Committee that AI developers would take the view that if they had to pay for (i.e. to license) content from Canadian creators, they would simply exclude it. The record of licensing deals being reached elsewhere suggests this is completely off base. Instead, the record shows that when AI developers want reliable, curated content to make their product better than the competition, they are ready to pay for it. But they will never pay for it if they are given a blank cheque through a legislated loophole. He also claims the position of the creative community is “Don’t use my stuff”. Again, the record of licensing deals to date and in the pipeline disproves this characterization in spades. Rather than blocking use of their content, creators are saying, “If you want to use my content, let’s talk”. Finally, Dr. Geist managed to completely mischaracterize the position of the creative community with regard to licensing. He said in his testimony that creators are advocating for a change to copyright law to mandate payments for AI training use. On the contrary, the creative community is simply asking that existing copyright law not be gutted. There is no need to create a mandatory payment requirement; existing copyright law is fit for purpose in dealing with how those wishing to use copyrighted content for purposes that fall outside fair dealing can do so. Negotiate a licence.

If any proof is needed of how the creation of a loophole will kill a licensing market is, all one needs to do is look at the sorry state of educational publishing in Canada. The industry has been decimated, and many authors have lost their livelihood because of the ill-conceived educational exception that was introduced into Canada’s Copyright Act in 2012. With that loophole in place, educational institutions across the country, with the notable exception of Quebec, began to tear up the reproduction licenses they had held from Access Copyright, the copyright collective representing authors. The educational exemption as part of fair dealing criteria could still be fixed, but the educational sector, facing severe financial pressures, has a powerful lobby working against it. The financial pressures are real, but taking a free ride on educational publishers and authors is wrong.

What happened with educational publishing is a cautionary tale for Canada. It should not make the same mistake twice. The way to promote a strong AI industry, alongside vibrant content industries, is licensing, not loopholes. Building a robust AI/TDM licensing market is the way to get more Canada into the training data, not giving the AI industry a blank cheque to help itself to the proprietorial content of others. With voluntary licensing everyone benefits. AI developers get secure access to quality content; the creative sector is rewarded for its efforts and becomes a partner in developing responsible AI. It’s a shame that the Canada Research Chair at the University of Ottawa doesn’t understand this.

AI Training and Nurturing Cultural Industries in Asia: Finding the Right Balance

Text and Data Mining (TDM) Exceptions and Compulsory Licensing Solutions Carry Heavy Risks

A scale balancing two labeled blocks, one marked 'TDM' and the other marked '©', representing the debate between Text and Data Mining and copyright.

Text and data mining (TDM) is a hot topic in many countries. In jurisdictions where exceptions to copyright protection are embedded in legislation rather than determined by the courts on a case-by-case basis (as in the US), TDM has become a favoured vehicle of AI developers, although compulsory licensing has also been floated by some as a potential solution. AI developers see TDM as a loophole allowing access to copyright protected works for algorithm training without payment or permission. Compulsory licencing would establish a statutory regime requiring rights-holders to provide access to their content upon payment by users. While seemingly offering a middle ground, it is fraught with problems. Meanwhile, content industry stakeholders have been vocal on the need to protect their intellectual property, while in some cases resorting to legal action.

TDM has been on the front burner in the UK, Australia and Canada, and Asia is facing many of the same issues. From India to Malaysia to Japan, and from Korea to Hong Kong to Singapore, access to copyrighted content for AI training is front and centre although being played out in different ways. Some countries already have instituted limited TDM exceptions while others are reviewing options. In India, which has long used compulsory licences in the patent field, and which has provision for compulsory licences under certain narrowly specified circumstances in its Copyright Act, both TDM and wider compulsory licensing are being pushed by the AI industry. A common thread in all countries is the concern by rightsholders that their valuable proprietorial content is being or may be taken and reproduced to provide training inputs to a commercial process without authorization or compensation. These concerns are not misplaced.

Compulsory licensing is a “solution” (actually opposed by many in the AI industry who believe that all content should be “free”) that strips away the rights of content owners to determine how their valuable intellectual property will be used. In effect, it is a form of expropriation. While compulsory licences may set a price for use (which may or may not be seen as fair), they don’t address other issues that are normally included in licensing deals such as how the work is to be used, or any specific limitations related to the content. There is also the difficult issue of equitably distributing collected funds.

Voluntary licensing where rights-holders can opt-in is a fairer and more feasible solution, offering mutual benefit to both the content and AI industries. A growing voluntary licensing market exists for print, AV and music content—but AI developers have been slow to respond, a key reason being the mixed signals they are receiving from various governments. Rather than negotiate, the AI industry would rather push for a broad exemption legalizing the practice of helping themselves to protected content owned by others. The pretexts advanced are either a) they are not really copying (just turning content into data tokens is the argument) or, b) if they are, they should be allowed to continue doing so in the name of “innovation”. There is also the implicit threat that if laws and regulations are too protective of the creative sector, AI development funds will go elsewhere, to more compliant jurisdictions.

This argument does not hold water as many factors go into making investment decisions regarding facilities such as data centres, notably the availability and cost of talent, land, power, etc. It is worth noting that while Malaysia does not have a TDM exception in its copyright law (whereas Singapore does), investment is pouring into Johore Bahru–just across the causeway from Singapore–because of Malaysia’s relative competitive advantage in input costs. The AI industry’s “fear factor” threatens to start a race to the bottom as governments around the world don’t want to be left behind as the AI race heats up. While it is clear that AI will transform some industries and has the potential to increase productivity in many areas, it may lead to more job losses than gains whereas the cultural sector is both a key economic driver in all the Asian economies in question and an important pillar of national identity.

India

India is a good case in point. It is a well known cultural and technological powerhouse with a creative economy that was estimated by WIPO to be valued at over $30 billion (USD) in 2023, with 20% growth in creative exports generating over $11 billion. Prime Minister Modi has called on the creative sector to further increase its share of GDP. Yet the TDM issue has raised its head in India, especially after OpenAI was sued by several Indian media entities for copyright infringement. In May Reuters reported that the Ministry of Commerce had set up an expert panel to examine the AI training issue. Both domestic and international content industries in India are concerned that creation of a TDM copyright exception or widening of compulsory licensing in India’s copyright law will undermine the incentive to create new content, and stall the development of a voluntary licensing market for AI training. Careless implementation of TDM or bringing in a misplaced compulsory licensing regime risks throwing out the baby with the bathwater.

Malaysia

As in India, AI industry lobbyists in Malaysia have called for implementation of a TDM exception. The case of neighbouring Singapore is often cited, but Singapore is a particularly poor example to follow. Singapore’s overly-broad TDM exceptions, referred to locally as exceptions to facilitate “computational data analysis”, combined with severe limitations on use of contract law to control access to copyright protected works, have weakened Singapore’s creative sector and held back the development of licensing options. There is no need for introduction of a TDM exception in Malaysia. Kuala Lumpur can distinguish itself by offering an appropriate balance between AI development and fostering important cultural industries, encouraging the development of a mutually beneficial licensing market. Its other attributes have helped it to successfully attract significant high-tech investment without undermining its investment in content creation.

Japan

Japan, which has a TDM exception in its copyright law, is often held out by AI developers as a model for the kind of copyright law they would like to see replicated elsewhere, but the impression that anything goes in Japan with respect to use of copyrighted content is mistaken and based on misunderstandings. As I outlined in a blog post last year (Japan’s Text and Data Mining (TDM) Copyright Exception for AI Training: A Needed and Welcome Clarification from the Responsible Agency), Japan’s TDM exception does not apply if the user of the copyrighted data “enjoys” the content. As an example, this means that if a user derives benefit through using the copied material to create outputs based on the reproduced content, the TDM exception does not apply. As this website succinctly puts it, “Expressive intent invalidates the safe harbour.” As is the case elsewhere, the limits of the law are being tested in court. Yomiuri Shinbun, Japan’s largest paper, as well as Nikkei and Asahi Shinbun, are suing Perplexity AI for copyright infringement in Tokyo District Court. Meanwhile a market for licensing content is beginning to develop.

Korea

Korean content companies are also turning to the courts for redress against unrestricted copying by digital platforms. Korea’s three terrestrial broadcasters, KBS, MBC, and SBS filed suit in January against Korean tech giant Naver claiming the platform used their news content to train its AI application. The broadcasters had earlier put Naver on notice not to use their content without permission. Naver is, broadly speaking, the Korean version of Google. It has recently been reported that more lawsuits are pending against Naver, this time from the Korean Newspaper Association.

Korea does not have a TDM exception in its copyright law, but it has (at least in theory), adopted the US fair use doctrine as a result of the US-Korea Free Trade Agreement. However, although fair use was incorporated into Korean law in 2011, its has seldom been used and the Korean courts have been very reluctant to apply it, and where they have, the application has been very narrow, essentially limited to non-commercial use. To date there have been no fair use cases brought to the Supreme Court, and lower courts tend to rely on the specified exceptions that apply in Korean law. Because of this there have been attempts to introduce a TDM exception, and more are expected in the current National Assembly. Various versions have been proposed that are of concern to rightsholders, including broad interpretations that would not distinguish between commercial and non-commercial use. Korea, one of the cultural giants in Asia, needs to tread carefully if it wants to maintain this leading cultural export, while encouraging development of content licensing.

Hong Kong

Hong Kong does not have a TDM exception in its current copyright law but under pressure from the AI sector is considering the idea. The Intellectual Property Department launched a public consultation late last year, receiving input from stakeholders representing both sides of the argument and has come forth with recommendations to the legislature (Legco). It has proposed a TDM exception for both commercial and non-commercial use but with a number of limitations; 1) access to content must be lawful (i.e. no use of pirated content); 2) a public record must be kept of copyrighted works used in AI training (transparency requirement); 3) the TDM exception will not apply where licensing schemes (i.e. licences that have been issued by the Copyright Tribunal) exist; and 4) rightsholders can reserve their rights by opting out.

There are problems with this proposal, despite the limitations. Requiring rightsholders to opt-out stands the existing basis of copyright on its head, as it has in the EU (i.e. users normally need to obtain permission from rightsholders in advance) while the licensing provision provides limited relief. While not as potentially destructive as some proposed TDM exceptions elsewhere, it is questionable if Hong Kong needs a TDM exception given that voluntary licensing alternatives are increasingly available. At present, the recommendations are with the Legco; given public skepticism about the proposal, legislation is not expected until 2026 at the earliest.

Conclusion

Lawmakers and regulators in Asia are grappling with a common problem; how to incentivize the development of responsible AI while continuing to encourage and promote all-important content industries. Cultural expression is particularly important in Asia as an expression of values, and throwing the cultural sector under the bus in the hopes of attracting some ephemeral hi-tech AI jobs is a false bargain. It’s like eating the seed grain from which the bounty of cultural creativity springs. Undermining the nurturing environment provided by sound copyright protection, whether through compulsory licensing or creation of TDM exceptions, is bad public policy.

Strong cultural industries enable the development of strong content licensing markets for AI development, enabling a virtuous circle of further creativity. A strong cultural sector and strong, sustainable digital industries, especially those powered by AI, go hand-in-hand. Asian regulators need to exercise prudence and weigh the consequences of rash action. The winners will be those that find the right balance between encouraging innovation and fostering creativity.

Australia Stands Up for its Creative Sector: A Useful Lesson for Canada and Others

Two coffee mugs side by side, one featuring the Australian flag and the other featuring the Canadian flag.

Image: Shutterstock

Australia just took an important stand in the tug-of-war being waged in many countries over whether, how and to what extent tech companies can use copyrighted content (text, music, images and so on) to train AI platforms by reproducing the content and extracting its essence without permission or compensation to rightsholders. Attorney-General Michelle Rowland has announced that while Australia will be undertaking consultations on revisions to its copyright laws to help address the needs of the AI industry, a Text and Data Mining (TDM) exception has been ruled out. Some countries, like the UK, have TDM exceptions for limited purposes (such as research and non-commercial use) in their laws while several other countries have TDM under review. Existing TDM exceptions allow reproduction of copyrighted content without the authorization of the rightsholder for research, data analysis, and in some cases for AI training purposes.

There is currently no TDM exception in Canadian law but as I noted in a recent blog post (“Canada’s Creative Sector Uneasily Awaits the Carney Government’s Next Steps on AI Training”), pressure is building from the AI sector to incorporate TDM into Canada’s Copyright Act. The government currently has yet another consultation paper on AI out for public comment and the Canadian cultural sector is organizing to protect creator’s rights, specifically calling on the Canadian government to “ensure that the Copyright Act is not modified through an exception permitting Text and Data Mining (TDM) or any other exception allowing technology developers or users to use protected works…to train generative AI systems without authorization or compensation…”. In doing so, it is taking a leaf from the book of Australian creators who mounted strong opposition to a proposal from the Productivity Commission, (PC) an independent research and advisory body created by an Act of Parliament some 25 years ago, that proposed in a report in August that Australia adopt a TDM exception. To say that this proposal put the cat amongst the pigeons would be an understatement.

The Commission has a reputation for denigrating the value of intellectual property and seeing it as an obstacle to industrial development rather than as an essential partner. In 2015 it proposed shortening the term of copyright protection from the current life of the author plus seventy years (“life plus 70”), a generally accepted international standard, to just “life plus 15”, (far lower than the Berne Convention minimum and a standard not adopted anywhere) while introducing a US-style fair use regime into Australia. There was strong pushback then, (it didn’t happen) and there was strong pushback this year (see here and here, for example) when the PC proposed introducing a TDM exception. It was particularly criticized for its lack of consultation with the creative industries in developing this proposal.

Now the Australian government has put its foot down, ruling out TDM but indicating that it will look at alternative solutions. These include examining whether to establish a new “paid collective licensing framework” under the Copyright Act for AI, or whether to maintain the status quo through voluntary licensing, clarifying how copyright law applies to material generated through the use of AI (i.e. whether there should be copyright protection for outputs produced by or with AI) and looking at the establishment of a new small claims forum to address lower-value copyright infringement matters.

It is generally accepted that AI is here to stay and will continue to need vast amounts of content for training. In most cases, copyrighted content is the kind of curated, high value work that AI developers need but until now, have preferred to appropriate without permission rather than pay for through licensing. In effect they have decided to ask for forgiveness after rather than permission beforehand. This has led to a plethora of lawsuits globally, including the recent $1.5 billion settlement that Anthropic has agreed to pay out to settle a class action suit brought by authors in the US. “Forgiveness” can be expensive. Inside the US, AI developers are arguing their copying is fair use, although at the same time they are beginning to hedge their bets by licensing content from a number of sources, ranging from media to music to image companies. Outside the US, AI companies have been beating the TDM drum, hoping that creation of wide TDM exceptions will obviate the need to negotiate with content owners. Nonetheless, voluntary licensing is growing globally. However, the surest way to kill a nascent licensing market is to give the tech industry a “get out of jail free” card by introducing a broad TDM exception. Australia has just rejected that option. Canada and others considering introducing new, or broadening existing, TDM loopholes should do the same.

It is not clear where Australia’s AI and Copyright review will end up, other than to note that it will not include TDM. As I have noted above, among other things it will be considering “collective licensing”. Collective licensing could help address the problem of remunerating individual rightsholders, in contrast to licence agreements signed between AI developers and corporate entities like media companies. However, Australia needs to steer clear of compulsory licensing which strips away the rights of copyright owners. Compulsory licences authorize use upon payment of a statutory or negotiated fee but remove the right of a copyright holder to withhold consent for use, or to impose specific limitations. A voluntary licence framework is fair to everyone. Compulsory licensing is not.

Canada and Australia have many things in common, (as well as a number of differences of course, beyond poutine vs vegemite). Among their commonalities is the desire to protect and foster a unique cultural identity in the face of global cultural homogenization. This is even more important in Canada given the realities of the struggle faced by 6 or 7 million Francophones to preserve their culture in a sea of 375 million Anglophones. Canada followed Australia’s lead (although less successfully) in requiring major online platforms to contribute financially to (i.e. pay for the use of) news media content. It should do the same by putting the idea of a TDM exception firmly to one side and instead focus on encouraging the development of voluntary licensing market for copyrighted content when used in AI training.

Canada’s Creative Sector Uneasily Awaits the Carney Government’s Next Steps on AI Training

Blasting a Wide TDM Hole in the Structure of Copyright is Not the Answer

A cartoon-style illustration showing a fist breaking through a brick wall labeled 'COPYRIGHT', with the fist wearing a band labeled 'TDM', surrounded by explosive graphical effects. — Image: Author (via DALLE-E)

The ongoing wrestling match-cum-dance between the creative sector and AI developers over the uncompensated and unauthorized use of copyrighted content for AI training is being played out in different ways in different countries. In the US it is largely a legal play in the courts at the moment, with mixed results for both sides. However, President Trump has made concerning public comments siding with the AI industry, saying it is impractical for AI developers to pay copyright holders for AI training (and besides, China doesn’t do it). Congress is still considering its options. In Australia, the Productivity Commission, never a friend of intellectual property, has just issued an interim report recommending the adoption of a Text and Data Mining (TDM) exception in Australia to boost development of the AI industry locally. The Australian creative sector mobilized quickly and has pushed back hard against this proposal, with the government now saying that it has no plans to amend the Copyright Act. In the UK, where there is a TDM exception but only for non-commercial purposes, the Starmer government quickly adopted a pro-AI strategy, part of which was to propose an expansion of TDM to include commercial purposes, although subject to an opt-out for rights-holders. That ignited a major storm among leading British creatives from Paul McCartney and Elton John on down. Through a unified campaign, British creators were able to gain support in the Upper Chamber (House of Lords) to slow down the legislation. As a result, the TDM issue has now been earmarked for further consultation and study. One thing is certain, the creation of a wide TDM exception is a sure way to stifle a nascent but rapidly developing licensing market for copyrighted content used for AI training.

It seems as if TDM, or more permissive TDM, is testing the boundaries of copyright just about everywhere. So, what about Canada? Canada has no TDM exception in its copyright law and, unlike the US, has clearly defined fair dealing exceptions that do not lend themselves to expansive court interpretation. Like other countries, it is trying to figure out how to not get left behind as the AI race accelerates. Canada initially had a first mover advantage in terms of AI research, given the work of Geoffrey Hinton, Yoshua Bengio and others, but recently it has been falling behind, notably lacking native startups. The cluster effect is not happening, with Canadian innovation going elsewhere for commercialization. To address these challenges, the new Carney government has appointed a dedicated Minister of Artificial Intelligence and Digital Innovation, former journalist Evan Solomon. This is the first time such a position has existed. One of Solomon’s first acts was to accelerate launch of an AI strategy beginning with a new consultation released on October 1 (closing at the end of this month), in the form of a survey to “help define the next chapter of Canada’s AI leadership”. This survey asks many relevant questions regarding AI and how it could be best developed in Canada but manages to mostly steer clear of the thorny question of AI training and copyright. The only question tangentially related to this issue is the following;

“Which infrastructure gaps (compute, data, connectivity) are holding back AI innovation in Canada, and what is stopping Canadian firms from building sovereign infrastructure to address them?”

Clearly this consultation is not going to turn over the TDM rock, at least not directly.

In the past couple of years, the government has issued two consultation papers on AI, one in 2021 and another last year as well as a “What We Heard” report. This report, issued earlier this year, summarizes the “great divide” between AI developers and the content industry. It’s first observation was that “Creators oppose the use of their content in AI without consent and compensation” but then goes on to say that “User groups support clarifications that TDM does not infringe copyright”.

After a couple of other observations about the centrality of human authorship and the need for transparency surrounding the use of copyright-protected works in the training of AI, the paper observed that there is “no consensus about whether existing legal tests and remedies are adequate”. That is the nub of the issue. There is no consensus, and while the courts are struggling with this issue (including in Canada, as I wrote about here and here), what Canadian creators fear is the introduction of a wide TDM exception in the name of maintaining “Canadian competitiveness”.

The launch of the new AI strategy and the evolution of the way in which copyrighted content is described in government consultation documents is indicative of the pressures on the government to shore up Canada’s AI strategy. It is interesting to note the shift in the definition of TDM from 2021 to today.

The definition provided in the 2021 consultation document described TDM as follows;

“The process of conducting TDM may require the making of reproductions of large quantities of works or other copyright subject matter to extract particular data and information from them. This process may be carried out using scientific or text-based data, as well as images, sounds, or other creative works.”

In the most recent consultative document, that definition has evolved;

“Text and data mining (TDM) consists of the reproduction and analysis of large quantities of data and information, including those extracted from copyright-protected content, to identify patterns and make predictions.”

Note the shift from “works” to “data”.[i] It’s a subtle difference but is hugely significant because data and facts are not protectable under copyright whereas the creative elements of original works are. The cultural sector is rightly concerned.

The Coalition for the Diversity of Cultural Expressions (CDCE), a major arts and creatives lobby group, is currently pressing Ottawa on a number of cultural issues, including AI. Among its AI asks are to;

Ensure that the Copyright Act is not modified through an exception permitting Text and Data Mining (TDM) or any other exception allowing technology developers or users to use protected works…to train generative AI systems without authorization or compensation;
Adopt national legislation on generative AI that requires developers of generative AI systems to disclose the training data they use; and
Adopt legislative provisions requiring public identification of content that is purely AI-generated.

Against these demands is the pressure coming from AI advocates who will argue that if the US loosens restrictions on use of copyrighted content for AI training, Canada will have no recourse but to follow. In other words, as goes the US, so goes Canada (or for that matter, the UK, Australia and others). Thus, what is happening in the US courts, and perhaps in Congress, is of critical importance for the creative sector everywhere including, in particular, Canada.

The issue of AI training on copyrighted content will need to be resolved sooner or later. Licensing solutions are developing quickly and if Canada can wait a bit longer it may be able to adopt licensing as the preferred solution (although the “What We Heard” report noted that “Some (intervenors) argued that licensing is an unnecessary burden because it may not be clear that copyright is engaged or that works used in TDM are being reproduced in the first place.”). There is pressure on the Carney government to take early action since AI industry developments are moving at lightning speed. With the TDM train gaining momentum in Canada and elsewhere, Canadian creators are understandably uneasy about what is likely to happen next.

As the CDCE notes, culture is a major economic and social pillar in Canada. In 2023, it generated $63.2 billion in value added and employed 669,600 people. Throwing all that under the bus in the name of remaining competitive on AI is a flawed choice, a point also made by the creative sectors in the UK, Australia and elsewhere. However, with the AI horse well out of the barn, copyright cannot be seen as an obstacle to innovation, an accusation freely levelled at it by some in the AI industry. Rather, it must be seen as a partner in innovation, which is where licensing comes in.

Blasting a wide TDM hole in the protection and incentive structure that copyright provides the creative sector is not the answer. The creative sector is watching and waiting anxiously.

[i] I am indebted to Erin Finlay, partner at Stohn Hay Cafazzo Heim Finlay LLP for drawing these changing definitions to my attention

Singapore Inhibits Rightsholders Ability to Use Contracts to Prevent Unlicensed Text and Data Mining of Content

Image: Shutterstock (AI modified)

Singapore already has one of the most permissive text and data mining (TDM) exceptions in copyright law found anywhere, allowing AI developers to ingest copyrighted content for AI training purposes subject only to a very few limitations, all of which are pretty minimal. These provisions were introduced back in 2021 when several changes were made to the Copyright Act, some, frankly, better than others from the perspective of rightsholders. The TDM exception is not among the more positive outcomes. It is made worse by a provision in Singapore law that, for all practical purposes, prevents rightsholders from exercising contract terms to prevent the unlicensed appropriation of their content for the commercial benefit of third parties, namely, AI developers.

The term used to describe text and data mining in Singaporean law is “computational data analysis”. As explained in this law firm blog post, this is defined as including;

“using a computer program to identify, extract and analyse information or data from the work; (or) using the work as an example of a type of information or data to improve the functioning of a computer program in relation to that type of information or data – a specific example being the use of images to train a computer program to recognise images”

The exception also permits supplying the works to other persons, provided this is for the purpose of “(i) verifying the results of the computational data analysis carried out by the latter; or (ii) collaborative research or study relating to the purpose of such analysis carried out by the latter.”

This is simply commercial, unlicensed web scraping of copyrighted content for AI development by another name, with very few limitations.

Singapore is now contemplating opening that door even wider by permitting the circumvention of digital locks (aka Technological Protection Measures, or TPMs) to allow text and data mining for AI training, as I wrote about here. (Singapore’s New Copyright Act Three Years On: There’s No Need to Open the AI Exception Door Even Wider). This proposal needs serious reconsideration as it would seriously tilt the copyright balance in favour of AI platforms to the detriment of rightsholders.

Another example of Singapore’s increasingly permissive approach to copyright protection is its undermining of the sanctity of contracts by limiting contractual terms that prevent unlicensed and unauthorized use of copyrighted content. In other words, the Act limits the ability of contractual terms to protect against, or override, the TDM exceptions. Rightsholders cannot “contract out” of the exceptions.

This provision was included when the Copyright Act was last updated (2021). The list of exceptions that cannot be restricted by contract was expanded to include the exception for computational data analysis use, as well as a couple of other uses; use of a work in judicial proceedings or for legal advice, and functions of galleries, libraries, archives, and museums, the so-called GLAM sector. The computational data analysis exception is the one of concern. It requires that to be enforceable, a contract that limits the ability of a user to scrape content must be individually negotiated. In other words, standard contract terms on websites that limit use (so called shrink-wrap or clickwrap agreements) cannot be used to override the TDM exception. This has the effect of rendering standard contractual terms virtually unenforceable. They become the exception rather than the norm.

The term “shrink-wrap agreement” was originally applied to the preprinted agreement included as part of the packaging containing a software program. By opening the packaging, the user agreed to comply with the licence terms of the software. The term has since been expanded to include “clickwrap agreements” that take effect when a user accepts the terms and conditions of a website. This can be used to specify that the content about to be accessed can only be used for certain purposes or under certain conditions. One of these conditions could be a restriction on use of unlicensed content for AI development. The Singapore legislation eliminates what is a standard practice used by rights-holders in many parts of the world to protect and control use of their content. It also means that robot.txt files used by rightsholders to signal that their content should not be freely scraped (compliance is voluntary) are unlikely to be respected in Singapore. Robot.txt limitations are often included in clickwrap agreements.

Not only does the Singapore law allow for a broad undermining of contractual terms, and prevents “contracting out”, but its TDM exception is very wide in terms of application. In the UK, while contractual terms cannot override the TDM exception, unlike in Singapore allowable TDM use is much narrower. The exception in the UK can be used only “for the sole purpose of research for a non-commercial purpose”. No such restriction exists in Singapore. In the EU, contractual terms can override the general TDM exception (Article 4), unless the unlicensed access is “conducted by research organisations and cultural heritage organisations” or is “for the purposes of scientific research”, (Article 3). In these limited cases only, the contractual override does not apply. This still provides broader protection for rights-holders, and where the contractual override is disallowed, it is for very limited purposes. This is a much more nuanced approach than the one adopted by Singapore.

Contract law is generally seen as the oil that lubricates the wheels of business. In the digital age, shortcuts in the form of clickwrap agreements have been used to convey contractual terms to users. In some jurisdictions, explicit consent is required by clicking “I Agree”. Singapore’s current copyright legislation undermines the sanctity of contracts by imposing unrealistic conditions, particularly with respect to limiting the rights of rightsholders to prevent web-crawlers from ingesting copyrighted content without licence or permission. To say this is problematic is an understatement.

Singapore can do better. As an exemplar of rule of law in the region, it should be as assiduous in protecting the rights of copyright owners as it seems to be in advancing the interests of AI developers. The motivation, apparently, is to promote “innovation”. This is a misread of what brings about innovation. True innovation comes with a partnership between rights-holders and users that protects and compensates rights-holders for the time, effort and investment they have put into developing content that is clearly of value to the AI community. That content should be licensed, or at the very least, rights-holders should be given the option to opt-out through the ability to enforce contract terms, including overriding text and data mining exceptions when necessary.

Tag: TDM

AI Training and Copyright: Australia Gets it Right—Now it’s Canada’s Turn

Like this:

Like Wasps at a Picnic: (Distracting from the Canadian Heritage Committee Report on AI and Creative Industries)

Like this:

We need more Canada in the Training Data, but through Licensing not Loopholes

Like this:

AI Training and Nurturing Cultural Industries in Asia: Finding the Right Balance

Like this:

Australia Stands Up for its Creative Sector: A Useful Lesson for Canada and Others

Like this:

Canada’s Creative Sector Uneasily Awaits the Carney Government’s Next Steps on AI Training

Like this:

Singapore Inhibits Rightsholders Ability to Use Contracts to Prevent Unlicensed Text and Data Mining of Content

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this: