Britain’s Proposed Approach to Text and Data Mining (TDM) for AI: How Not to do It (A Lesson for Canada and Others).


Last month I wrote about the emerging phenomena of AI-generated art through widely available programs such as DALL-E 2, Stable Diffusion and others, and of the threat they pose to artists, designers, photographers and all those who depend on the protection of copyright to earn their livelihood. This also includes musicians and writers. Artificial Intelligence (AI) is now being used to create “sound-alike” music where users can sing in their favourite performer’s voice, and commercial AI programs can be purchased to write marketing copy, and even novels. Allowing the widespread misappropriation of copyrighted content to produce AI-generated products that compete on a commercial basis with the original creation, while at the same time using unauthorized inputs from the original works to help produce the competing product, is a misuse of AI. It is also a counterproductive policy that threatens to undermine the fundamental basis of economically significant cultural industries. Furthermore, it is just plain wrong, stealing the work of creators in a misguided attempt to boost “innovation” and development of AI.

The UK’s Proposed Limitless TDM Exception

This issue is playing out right now in the UK and poses an immediate threat not only to creators and copyright industries in Blighty but globally, given the negative precedent this will set if it becomes law in Britain. The UK government recently unveiled the results of its public consultation on AI and Intellectual Property. Among the more startling recommendations was the proposal on text and data mining. As stated in the discussion paper (Section 58-61).

“The Government has decided to introduce a new copyright and database right exception which allows TDM for any purpose (emphasis added)…Introducing an exception which applies to commercial TDM will bring benefits to a wide range of stakeholders in the UK…The benefits will be reducing the time needed to obtain permission from multiple rights holders and no licence fee to pay…Rights holders will no longer be able to charge for UK licences for TDM and will not be able to contract or opt-out of the exception.”

Frankly, this is nothing short of outrageous. It reeks of the British government’s desperate desire to prove that Brexit was not the colossal mistake it clearly was by trying to out-manoeuvre the EU. This proposal would make the British TDM exception broader and less restrictive than the existing EU law (which allows TDM only for non-commercial research purposes), all in the name of promoting innovation. What it is really doing is trying to steal data mining jobs from other jurisdictions on the backs of creators and copyright industries. If this misguided policy comes into effect, the vibrant British cultural sector will pay the primary price, although respect for copyright will be weakened generally. Not surprisingly, British artists have spoken up. Equity’s Audio Committee wrote to the Minister responsible pointing out the catastrophic effect the exemption could have for UK based performers and their professional work if implemented. As an example, the Committee’s letter stated that any video or sound recording that is publicly available could to be mined for free by third parties, without the consent of the copyright owner, to generate new AI content.

DACS (the Design and Arts Copyright Society) stated that the new TDM exception will “drastically weaken copyright protections for copyright holders in the UK, which supports the livelihoods of workers and businesses across the creative and cultural industries.”  Stressing that it was not opposing the development of AI, DACS noted that licensing copyright-protected works is a vital revenue source for visual artists at all stages of their careers. The UK government’s paper admits that if implemented the new TDM exception will put out of business those who have built business models around data licensing, but apparently that’s not the government’s concern. It’s all about incentivizing AI research, you see, and damn the consequences. While TDM is often undertaken for purposes other than development of AI, the wording of this proposal would allow an exception for any purpose, in other words specifically enabling TDM to feed AI algorithms.

Arts and culture are major economic drivers in most developed economies and the UK is no exception. The creative economy in the UK is a major provider of exports, employs over 2 million people and contributed over £115 billion in 2021 to the British economy, according to the report “Creative Industries: Trade challenges and opportunities post pandemic” prepared for the UK’s Department of International Trade. Are these benefits and these jobs to be sacrificed on the altar of “AI innovation”? It is a remarkably short-sighted proposal and probably contravenes Britain’s international obligations under the Berne Convention and TRIPS[i]. To date, no law has been introduced to give effect to this recommendation and given the current disarray in which the British government finds itself domestically, its implementation will hopefully be delayed, allowing for sober second thought.  

Developments in Canada Regarding TDM

While hopefully this ill-conceived recommendation will be reconsidered, it is a cautionary tale that should be borne in mind by Canada and other states that may be contemplating legislating a TDM copyright exception. (In the US, TDM is governed by fair use under Section 107 of the Copyright Act. While each case is decided on its merits, it is unlikely that a court would make a determination of fair use for unauthorized use of copyrighted material for TDM purposes if a TDM licence was available and/or if the final work produced from the inputs harmed the market for the original work by competing with it commercially). It is all too easy to trample creators’ rights—in the process gutting a thriving industry that contributes immensely to national economic and cultural well-being—in the misguided rush to clamber aboard the AI train. If a TDM exception is considered necessary, it should be narrowly tailored to deal with the specific needs of academic research, while staying consistent with international treaty commitments, and not be used to create a product from text mining that will unfairly compete with the work of the original creator.

In Canada, an update to the Copyright Act is overdue by several years. As part of that process, two Parliamentary Committees were struck in 2019 to review the Act. One of those committees, the INDU Committee, recommended that what it called “Informational Analysis” (TDM by another name) be permitted under the Act, (Recommendation 23), either by adding it to the list of specified fair dealing purposes or by creating a specific exception. A consultation paper on a “Modern Copyright Framework for Artificial Intelligence and the Internet of Things” was released by the Department of Innovation, Science and Economic Development in July of 2021, inviting public comment. TDM was one of the issues raised. No recommendations have as yet come forth as a result of this consultation.

There is pressure from academic circles to circumvent the licensing conditions that many publishers maintain when permitting use of their materials even though publishers often grant rights to use materials for research purposes without charge, or with minimal conditions such as attribution or on the condition that the new product does not create a substitute for the original. In other cases, rights-holders may exercise their right to require a licence fee, which can be an important revenue source in some copyright-related industries, such as publishing. I can understand that from an academic point of view, it may be frustrating to have to contact multiple rights-holders to get permission to use content, or to find that access to some materials is blocked or constrained by licensing requirements. Reasonable arguments can be made that some forms of TDM are necessary for research and innovation, which could include AI. Could these not be restricted solely to public domain materials? Ideally, but not always.

Dr. Lucie Guibault of the law faculty of Dalhousie University (Halifax, NS) has examined the issue of TDM in Canadian law and concludes that it is not currently legal under either the exception for temporary reproduction (Copyright Act, Section 30.71) nor under fair dealing, notwithstanding that it could meet the enumerated purpose of “research”. This is primarily because of the amount of reproduction that takes place—basically all of a work. With respect to the argument that researchers should restrict themselves to public domain works, she points out this is not always practical, referring as an example to the work of academics such as Professor Andrew Piper of McGill University who as part of his research analyzed the plots and popularity of contemporary novels, all of which are copyright protected. (Note, however, that Piper’s work was not to create a database for AI purposes, but rather to analyze a range of works and draw conclusions from that data analysis).

Prof. Guibault’s solution is to introduce a text and data mining exception into Canadian law, along the lines of what has already been done in the EU (restricted to non-commercial research purposes), although she believes that the non-commercial limitation is too restrictive. She bases her argument on the fact that a TDM exception will have no impact on the economic interests of the rights-holder nor in any way affect the normal exploitation of the work. That would suggest a narrow exception and would rule out the kind of data scraping that is producing AI-generated art through DALL-E 2 and other programs. It would also rule out the kind of limitless TDM exception proposed by the British government.

One really important issue to bear in mind when promoting an academic, “non-commercial use” exception is that of data laundering. This occurs when data is collected by a non-commercial entity, such as the German non-profit LAION, which in turn then provides the data to a platform that exploits it commercially. In this case it was LAION that provided the data on which Stability.AI has built its AI-generated art tool Stable Diffusion. The dangers of data laundering are well covered in this article.

As countries come to grips with the need to promote AI research and development, they need to avoid a “race to the bottom” when it comes to shedding protection for copyrighted works. Allowing holus-bolus unauthorized copying and scraping of copyrighted material to feed AI research destroys any market for licensing content while in some cases producing AI-generated works that unfairly compete with the original work of creators. This erodes economic incentives for authors and artists and undermines copyright-based industries that provide substantial employment and contribution to GDP, in addition to nourishing the cultural soul of nations.

AI is here to stay. It is a legitimate research tool and can bring new products and services to market. Creators themselves often use it. In the rush to embrace AI, policy makers must not ignore the critical role that copyright has played for over two centuries in spurring creativity and creating economic and cultural welfare. As technology has developed, copyright has adapted. It has not been thrown under the bus. Respect for copyright must remain one of the cardinal principles taken into account when designing and implementing policies to promote AI.

© Hugh Stephens, 2022. All Rights Reserved.

[i] TRIPS-Trade Related Intellectual Property Agreement, an agreement among members of the World Trade Organization.

This paper has been updated to add a reference to the Government of Canada discussion paper on Modern Copyright and Artificial Intelligence, and the Internet of Things, published in July 2021.

Author: hughstephensblog

I am a former Canadian foreign service officer and a retired executive with Time Warner. In both capacities I worked for many years in Asia. I have been writing this copyright blog since 2016, and recently published a book "In Defence of Copyright" to raise awareness of the importance of good copyright protection in Canada and globally. It is written from and for the layman's perspective (not a legal text or scholarly work), illustrated with some of the unusual copyright stories drawn from the blog. Available on Amazon and local book stores.

4 thoughts on “Britain’s Proposed Approach to Text and Data Mining (TDM) for AI: How Not to do It (A Lesson for Canada and Others).

  1. One lesson that Canada could provide, about how NOT to do something, is in failing to practically and effectively define the boundaries and uses of “fair dealing”. By failing to answer the question “how much is too much?”, the Canadian approach has triggered a period of commercial devastation for Canadian publishers, with no practical remedy proposed to date. It would be pretty to think that Canadian legislators would apply such learnings to the uses and appropriate limitations for TDM… but they show no real signs of having learned the original lesson yet. I’m not optimistic.

    Sent from my iPad

    Bruce Madole Mob. 905-867-7506


Leave a Reply

%d bloggers like this: