Copyright and AI Training: The UK Rethinks its Blatant Content Giveaway Scheme—But What Comes Next?

A robot holding a British flag with the text 'GIVE ME YOUR CONTENT FOR FREE' and 'OPT OUT IF YOU CAN' displayed prominently.

Image: Shutterstock (modified)

On March 18 the UK’s Labour government finally confirmed what everyone has known for months, that its preferred policy option of allowing AI developers to freely use copyrighted content to train their AI algorithms unless rightsholders specifically opt out, was a shambles. The government had been backpedalling for a while but the issuance of the Parliamentary-mandated Report on Copyright and Artificial Intelligence, all 125 pages of it, finally drew a line  under all the prevarications. The paper was issued by the two Departments that represent the opposite ends of the spectrum when it comes to the copyright/AI question, the Department for Science, Innovation and Technology and the Department for Culture, Media and Sport. The inconclusive results reflect this duality.

The government’s initial Consultation Paper had proposed four options to deal with the issue of unauthorized use of copyrighted content to train AI platforms;

  • Do nothing (status quo). Copyright and related laws would remain as they are. It would be left to the courts to settle disputes between rightsholders and unauthorized users;
  • Strengthen copyright by requiring licensing in all cases of use of copyrighted content;
  • Legislate a broad text and data mining (TDM) exception to copyright law. (Britain already has a TDM exception but it is limited to non-commercial research purposes);
  • Create a data mining exception with opt-out and transparency measures. Rightsholders would be required to notify when their works were not to be accessed, and AI developers would be required to disclose what works they had used.

The Paper indicated that the last option, Option 3, was the government’s preferred choice. Unfortunately–for the government–fully 97% of the 11,520 respondents to the consultation survey disagreed. To put it another way, only 3% of respondents endorsed the government’s preferred option. (Note to UK government: Wipe egg off face). Rightsholders did not want the onus to be on them to opt-out (given that copyright law is based on the user, subject to fair dealing and other statutory exceptions, needing to obtain permission in advance from a rightsholder), and because opting out is not necessarily easy to do. AI developers objected to the transparency provisions. They preferred Option 2, a broad TDM exception allowing them to help themselves to whatever content they deemed useful for AI training without let or hindrance. The last thing the tech industry wants is to be required to document what content it has appropriated without authorization. To oppose Option 3, rightsholders mounted a widespread and very effective campaign to remind legislators of the value of Britain’s creative industries, both economically and culturally. Everyone from those two noble knights, Sir Elton John and Sir Paul McCartney, to Ed Sheeran, Dua Lipa, Kazuo Ishiguro, Andrew Lloyd Webber, Cat Stevens, Sting, Tom Stoppard, and on and on—maybe everyone who is anyone in the cultural world in Blighty, except Banksy, spoke out. Now the government has thrown up its hands and admitted it got it all wrong. The blatant content giveaway cum confiscation is not going to happen, at least not in the form initially proposed.

The Starmer government’s particular slice of humble pie was worded as follows;

We must take the time needed to get this right. We will not introduce reforms to copyright law until we are confident that they will meet our objectives for the economy and UK citizens. This means protecting the UK’s position as a creative powerhouse, while unlocking the extraordinary potential of AI to grow the economy and improve lives. Any reform must ensure that right holders can be fairly rewarded for the economic value their work creates, and that they are protected against unlawful and unfair use of their work. It must also ensure that AI developers can access high quality content. It is clear through the consultation and our subsequent engagement that there is no consensus on how these objectives should be achieved.

Truer words were never spoken. And there will never be consensus as long as one of the parties is an entitled tech industry that feels it can freely confiscate valuable content produced and owned by others, and which has no qualms about accessing pirate websites to do so. Legal consequences are required to bring it to the negotiating table.

Although we now know the UK government will not forge ahead to adopt the Option 3 opt-out proposal, we still really don’t know what it will do. Amidst all the relief from the creative community that Option 3 has been dropped as the chosen way forward, there is still a sense of unease about what might happen instead. Composer Ed Newton-Rex, one of the leading activists amongst the creators, has voiced his concerns on X. While the government has withdrawn its preference for Option 3, it is still on the table as is the possibility of some other form of TDM exception. He is worried that while there is talk of compensation for rightsholders, permission does not seem to be part of the equation. With the dropping (for now at least) of Option 3, transparency requirements for the tech industry are also once again in doubt.

What will happens next on this issue in the UK? More study, more research, more monitoring of developments. In other words, back to Square One. The one new development coming out of the government’s Report is an acknowledgement that something needs to be done with respect to digital replicas. As described in the Report, this involves the use of AI to replicate or mimic the appearance or voice of individuals. Current copyright is not well adapted to protect against such misuse. The result could be the introduction of a right of personality in the UK.

When it comes to the intersection of copyright and AI development, the UK government is trying to resolve an issue with which many countries are grappling. The question is how to get a slice of the AI investment pie by incentivizing the tech industry without throwing the creative community under the bus. In the US, finding the balance has largely been driven by the courts. The White House has just issued its National Policy Statement on Artificial Intelligence. Among its many policy positions, the Statement indicates this question should continue to be left to the courts to decide, even though the White House believes that unauthorized and uncompensated use of copyrighted material for AI development is fair use under US law. In Australia the government looked at the same issue and in the face of vocal and organized opposition from the creative sector, explicitly ruled out introducing a TDM exception. India tried to find the balance by proposing the establishment of an unworkable compulsory licence regime, a scheme that I described in a recent blog post as the “worst of all worlds”. It was as widely condemned by both creators and tech developers as the UK’s Option 3. Canada, which like Australia does not have a TDM exception in its copyright law but which is being pushed by the tech industry to loosen restrictions in the face of current and anticipated lawsuits, is facing similar questions.

While this may seem to some (like the UK government for example) as an almost intractable problem that will require much more work, study and consultation to resolve, in fact it is not all that difficult. Cultural industries generally are not opposed to the development of AI. In fact, many creators use it to assist their work. But they want to have some say over whether or how their work is being used and want to receive compensation when it is used. It is true that some individuals may wish to have nothing to do with AI and would like to see it go away. That, however, is not going to happen although their right to have their work not used to develop applications that may unfairly compete with the content they produce must be respected. The best outcome is a win/win scenario where creators voluntarily opt in to the AI development process and share in its benefits. This is already happening on an increasingly large scale with respect to commercial licensing deals, with news media outlets, studios, music publishers and book publishers all signing licensing agreements with AI developers. Left out at this stage are most individual authors and artists and small publishers, but voluntary collective licensing can fill the gap. As a recent example, at the recent London Book Fair in March of this year, the first stage of an opt-in collective licensing initiative was launched by Publishers Licensing Service to supplement direct agreements between publishers and AI companies.

The “magic formula”–the win/win solution–which so many countries are finding so hard to unlock is based on three principles, well articulated by Canada’s Coalition for the Diversity of Cultural Expression (CDCE);

1. Authorization (Permission)

2. Licensing (Remuneration)

3. Transparency (Disclosure)

Acceptance of these three cardinal principles by AI developers would cut through what the UK government seems to see as a Gordian Knot. The solution is not all that difficult, but it requires courage to stand up the tech industry’s threat of taking their ball and going elsewhere to play. Let’s hope that as the UK and other governments go back to the drawing board, they use these principles as guideposts to arrive at a solution that, at the end of the day, best serves everyone.

© Hugh Stephens, 2026. All Rights Reserved

Deloitte’s AI Nightmare: Top Global Firm Caught Using AI-Fabricated Sources to Support its Policy Recommendations

Close-up of the Deloitte building sign, with greenery in the foreground and a clear sky in the background.

Image: Shutterstock

As we start a new year, 2026, it is a given that artificial intelligence (AI) is going to be the big issue for authors, publishers and the copyright industries generally. The issue of whether it is legal to use copyrighted materials to train AI platforms without the consent of rightsholders will continue to be fought out in courts and legislatures. The use of AI to create content will also continue as an ongoing issue, both the extent to which assistance from AI renders outputs non protectable by copyright and whether content produced using AI is reliable and trustworthy. Deloitte, by revenue the world’s largest consulting firm has just learned that lesson in spades. As well it should.

Would you hire an expensive consulting firm that used AI to supplement its research, didn’t inform you it was doing so, and when it got caught serving up AI-fabricated citations claimed that the false documentation in no way invalidated its policy recommendations and conclusions? I wouldn’t but apparently the Government of Canada has no such qualms, according to a Canadian Press story. We are talking about Deloitte, one of the world’s four largest consulting and accounting firms (the others being PwC, EY and KPMG), a company that charges a premium for its specialized services, and which ought to know better. It’s not as if Deloitte was caught just once with its hand firmly embedded in the AI cookie jar. First it happened in Australia, where the company was forced to reimburse the client, Australia’s Department of Employment and Workplace Relations, for a report that was reportedlyriddled with fake citations, phantom footnotes, and even a made-up quote from a Federal Court judgment.” It also pulled the same stunt in Canada where it produced a report on health care for the Newfoundland and Labrador provincial government. The 500-page report contained at least four citations of research papers that do not exist. These were used, with others, to support recommendations related to recruitment strategies, monetary incentives, virtual care, and the impact of the COVID-19 pandemic on healthcare workers. However, as the Independent, a local digital news outlet reported, Deloitte, having been caught redhanded, stated it is “revising the report to make a small number of citation corrections, which do not impact the report findings.

What a joke! Of course the fabricated citations impact the report’s findings, as any first year university student knows. Frankly, this is a disgrace and I think Deloitte should be put in the penalty box for six months to a year as punishment, i.e. no government should contract with them until they learn to clean up their act. The irony is that Deloitte advertises itself as providing consulting services to governments to enable them to use AI effectively. Consulting firms are supposed to bring new expertise and perspectives to management problems that governments, in this era of cutbacks, no longer have the resources to solve. They are expensive but provide a quick turnaround for public service managers who don’t have the in-house resources to deal with emerging issues, and who often don’t have the luxury of time to staff up to meet immediate needs. But the dirty little secret is that in many cases the consulting firms apply their cookie-cutter templates to inform their findings whether the template suits or not. They also employ junior staff to do much of the grunt work without, apparently, providing them with adequate supervision or guidance. But even if the labour-intensive task of finding citations to justify the “researched conclusions” of the commissioned report was subcontracted to an AI bot, someone senior at Deloitte signed off on the final product. It didn’t take the client or journalists very long to track down the fabricated citations, so why couldn’t Deloitte have run the same quality check? Because they couldn’t be bothered, I guess.

Despite having been caught, Deloitte may be big enough to shrug this one off, but I sincerely hope they have learned a lesson. Even one false or fabricated citation undermines the credibility of research. As I noted in a recent blog post (Delegating Research to AI is a Risky Proposition: The “Hallucination” Phenomenon-User Beware), “In our rush to embrace AI, many seem to have forgotten the value of human creativity and judgement”. Deloitte has egg on its face, and needs to wear this. A consulting firm is only as good as its reputation, and as far as I am concerned, Deloitte has just put its reputation through the shredder.

This is a cautionary tale, one that has enmeshed not only the world’s largest consulting firm, but various law firms that have been caught citing fabricated precedents. Students would be sanctioned for using AI this way (if they were caught) and academics would suffer major hits to their reputation. Research results and qualifications might be invalidated. If these are the sanctions for misuse of AI, then we should expect no less from entities like Deloitte and its ilk. Let’s hope there are no further AI fabrication horror stories in 2026. (A vain hope, I am sure).

© Hugh Stephens, 2026. All Rights Reserved

Canada’s Creative Sector Uneasily Awaits the Carney Government’s Next Steps on AI Training

Blasting a Wide TDM Hole in the Structure of Copyright is Not the Answer

A cartoon-style illustration showing a fist breaking through a brick wall labeled 'COPYRIGHT', with the fist wearing a band labeled 'TDM', surrounded by explosive graphical effects.
Image: Author (via DALLE-E)

The ongoing wrestling match-cum-dance between the creative sector and AI developers over the uncompensated and unauthorized use of copyrighted content for AI training is being played out in different ways in different countries. In the US it is largely a legal play in the courts at the moment, with mixed results for both sides. However, President Trump has made concerning public comments siding with the AI industry, saying it is impractical for AI developers to pay copyright holders for AI training (and besides, China doesn’t do it). Congress is still considering its options. In Australia, the Productivity Commission, never a friend of intellectual property, has just issued an interim report recommending the adoption of a Text and Data Mining (TDM) exception in Australia to boost development of the AI industry locally. The Australian creative sector mobilized quickly and has pushed back hard against this proposal, with the government now saying that it has no plans to amend the Copyright Act. In the UK, where there is a TDM exception but only for non-commercial purposes, the Starmer government quickly adopted a pro-AI strategy, part of which was to propose an expansion of TDM to include commercial purposes, although subject to an opt-out for rights-holders. That ignited a major storm among leading British creatives from Paul McCartney and Elton John on down. Through a unified campaign, British creators were able to gain support in the Upper Chamber (House of Lords) to slow down the legislation. As a result, the TDM issue has now  been earmarked for further consultation and study. One thing is certain, the creation of a wide TDM exception is a sure way to stifle a nascent but rapidly developing licensing market for copyrighted content used for AI training.

It seems as if TDM, or more permissive TDM, is testing the boundaries of copyright just about everywhere. So, what about Canada? Canada has no TDM exception in its copyright law and, unlike the US, has clearly defined fair dealing exceptions that do not lend themselves to expansive court interpretation. Like other countries, it is trying to figure out how to not get left behind as the AI race accelerates. Canada initially had a first mover advantage in terms of AI research, given the work of Geoffrey Hinton, Yoshua Bengio and others, but recently it has been falling behind, notably lacking native startups. The cluster effect is not happening, with Canadian innovation going elsewhere for commercialization. To address these challenges, the new Carney government has appointed a dedicated Minister of Artificial Intelligence and Digital Innovation, former journalist Evan Solomon. This is the first time such a position has existed. One of Solomon’s first acts was to accelerate launch of an AI strategy beginning with a new consultation released on October 1 (closing at the end of this month), in the form of a survey to “help define the next chapter of Canada’s AI leadership”. This survey asks many relevant questions regarding AI and how it could be best developed in Canada but manages to mostly steer clear of the thorny question of AI training and copyright. The only question tangentially related to this issue is the following;

“Which infrastructure gaps (compute, data, connectivity) are holding back AI innovation in Canada, and what is stopping Canadian firms from building sovereign infrastructure to address them?”

Clearly this consultation is not going to turn over the TDM rock, at least not directly.

In the past couple of years, the government has issued two consultation papers on AI, one in 2021 and another last year as well as a “What We Heard” report. This report, issued earlier this year, summarizes the “great divide” between AI developers and the content industry. It’s first observation was that “Creators oppose the use of their content in AI without consent and compensation” but then goes on to say that “User groups support clarifications that TDM does not infringe copyright”.

After a couple of other observations about the centrality of human authorship and the need for transparency surrounding the use of copyright-protected works in the training of AI, the paper observed that there is “no consensus about whether existing legal tests and remedies are adequate”. That is the nub of the issue. There is no consensus, and while the courts are struggling with this issue (including in Canada, as I wrote about here and here), what Canadian creators fear is the introduction of a wide TDM exception in the name of maintaining “Canadian competitiveness”.

The launch of the new AI strategy and the evolution of the way in which copyrighted content is described in government consultation documents is indicative of the pressures on the government to shore up Canada’s AI strategy. It is interesting to note the shift in the definition of TDM from 2021 to today.

The definition provided in the 2021 consultation document described TDM as follows;

“The process of conducting TDM may require the making of reproductions of large quantities of works or other copyright subject matter to extract particular data and information from them. This process may be carried out using scientific or text-based data, as well as images, sounds, or other creative works.”

In the most recent consultative document, that definition has evolved;

“Text and data mining (TDM) consists of the reproduction and analysis of large quantities of data and information, including those extracted from copyright-protected content, to identify patterns and make predictions.”

Note the shift from “works” to “data”.[i]  It’s a subtle difference but is hugely significant because data and facts are not protectable under copyright whereas the creative elements of original works are. The cultural sector is rightly concerned.

The Coalition for the Diversity of Cultural Expressions (CDCE), a major arts and creatives lobby group, is currently pressing Ottawa on a number of cultural issues, including AI. Among its AI asks are to;

  1. Ensure that the Copyright Act is not modified through an exception permitting Text and Data Mining (TDM) or any other exception allowing technology developers or users to use protected works…to train generative AI systems without authorization or compensation;
  2. Adopt national legislation on generative AI that requires developers of generative AI systems to disclose the training data they use; and
  3. Adopt legislative provisions requiring public identification of content that is purely AI-generated.

Against these demands is the pressure coming from AI advocates who will argue that if the US loosens restrictions on use of copyrighted content for AI training, Canada will have no recourse but to follow. In other words, as goes the US, so goes Canada (or for that matter, the UK, Australia and others). Thus, what is happening in the US courts, and perhaps in Congress, is of critical importance for the creative sector everywhere including, in particular, Canada.

The issue of AI training on copyrighted content will need to be resolved sooner or later. Licensing solutions are developing quickly and if Canada can wait a bit longer it may be able to adopt licensing as the preferred solution (although the “What We Heard” report noted that “Some (intervenors) argued that licensing is an unnecessary burden because it may not be clear that copyright is engaged or that works used in TDM are being reproduced in the first place.”). There is pressure on the Carney government to take early action since AI industry developments are moving at lightning speed. With the TDM train gaining momentum in Canada and elsewhere, Canadian creators are understandably uneasy about what is likely to happen next.  

As the CDCE notes, culture is a major economic and social pillar in Canada. In 2023, it generated $63.2 billion in value added and employed 669,600 people. Throwing all that under the bus in the name of remaining competitive on AI is a flawed choice, a point also made by the creative sectors in the UK, Australia and elsewhere. However, with the AI horse well out of the barn, copyright cannot be seen as an obstacle to innovation, an accusation freely levelled at it by some in the AI industry. Rather, it must be seen as a partner in innovation, which is where licensing comes in.

Blasting a wide TDM hole in the protection and incentive structure that copyright provides the creative sector is not the answer. The creative sector is watching and waiting anxiously.

© Hugh Stephens, 2025. All Rights Reserved


[i] I am indebted to Erin Finlay, partner at Stohn Hay Cafazzo Heim Finlay LLP for drawing these changing definitions to my attention

When Will AI Developers Take Responsibility for the Products They Provide Their Subscribers?

What They are Doing Really Bugs Me

Illustration comparing a Midjourney-generated image of Bugs Bunny on the left with Warner Bros. copyrighted images of Bugs Bunny on the right, featuring different styles and settings.

Image: US District Court Filing

I confess to having been a lifetime fan of Bugs Bunny, that “Wascally Wabbit”, and not just because I worked for Time Warner at one point in my career. His insouciance, his ingenuity and his cultural achievements (have you seen Bugs perform opera or conduct a symphony orchestra?) are legend. Thus it was with some interest that I read the headline in my morning newspaper “Warner Bros. sues AI Company over Images of  Bugs Bunny and other characters”. It was based on a generic AP report that appeared in many journals across North America. The AI company in question is Midjourney. Hollywood Reporter has done a deeper dive comparing images produced with Midjourney’s AI program to copyrighted Warner Bros. (WB) images, drawn from the lawsuit submission. This is not the first confrontation between Hollywood and Midjourney. In June Disney and Universal brought a similar suit alleging that the AI company’s image generator produces near replicas of its copyrighted characters.

Just in case you forget what Bugs looks like, Warner Bros. (technically now known as Warner Bros. Discover) has a complete description in its lawsuit;

Many of the Looney Tunes characters are ubiquitous household names, and these characters have expressive conceptual and physical qualities that make them distinctive and immediately recognizable. Bugs Bunny, for example, is a playfully irreverent anthropomorphic gray and white rabbit, who has a star on the Hollywood Walk of Fame. Bugs Bunny has an overbite that showcases his two long front teeth, oversized feet with white fur, and is often depicted eating a carrot.

Does this description look anything like the image produced by Midjourney, as reproduced in the filing (see paragraphs 85 and 86), which I have pinched as the image for this blogpost. Scroll up or down to see the full range of characters at issue, ranging from Tweetie to Batman.

Midjourney’s response to the earlier lawsuit, and now to Warner Bros. is that they are not responsible for any copyright infringement that may occur. You see, it is the users of their service who are to blame. Not them. According to their court filing:

“The Midjourney platform is an instrument for user expression. It assists with the creation of images only at the direction of its users, guided by their instructions, in what is often an elaborate and time-consuming process of experimentation, iteration, and discovery.

Midjourney users are required by Midjourney’s Terms of Service to refrain from infringing the intellectual property rights of others, including Plaintiffs’ rights, Midjourney does not presuppose and cannot know whether any particular image is infringing absent notice from a copyright owner and information regarding how the image is used.”

Warner Bros. points out that Midjourney could easily control infringing outputs by (1) excluding WB content from training its AI system (2) rejecting prompts from users requesting WB characters and (3) using technical means to screen images. But instead, Midjourney has become a vending machine for WB content, selling a commercial service powered by AI that was developed using infringing copies of WB works and then allows users to reproduce or download infringing images or videos. These outputs directly compete with WB copyrighted content.

Midjourney’s defence strikes me as similar to arguments used by the manufacturers of guns. “Guns don’t kill people. People kill people.”  Except that there are a number of limitations on the kind of gun you can sell, and its capabilities. While the law varies from jurisdiction to jurisdiction, what is common are restrictions on selling automatic and semi-automatic weapons. The reasons are obvious. While there is a use for some kinds of guns (gun clubs, hunting etc.) there is no legitimate need for unlimited lethality. Not all gun purchasers (like AI software users) can be trusted so limitations are placed on what gun manufacturers are allowed to make available to the public. Similarly, while there are many uses for image-generating AI platforms, there are also legal limits to what is acceptable, such as when AI is used to create child porn. AI companies have agreed to set and enforce guardrails against this, and are clearly capable of doing so. Since, regrettably, not all users can be trusted, to simply to ask them to acknowledge and abide by Terms of Service is inadequate. So, if AI companies can stop some categories of use, they are equally capable of marketing a service that avoids copyright infringement. Enough of the “blame the user” nonsense. Design and market a service that conforms to the law.

Another good example of the “blame the user” excuse is META’s creation of “flirty chatbots” using virtual images of celebrities such as Taylor Swift, Scarlett Johanson, etc. According to Variety, quoting a report from Reuters who researched the issue, the celebrity AI chatbots  “routinely made sexual advances, often inviting a test user for meet-ups.” In some cases, when they were asked for “intimate pictures,” the chatbots “produced photorealstic images of their namesakes posing in bathtubs or dressed in lingerie with their legs spread.” Many of these chatbots, which clearly violate the right of publicity of the subjects, were user produced. But users could not produce these images, and cross the line into illegality, unless they were enabled to do so by the program produced by META. META claims that the production of such images violates its rules, which prohibit the direct impersonation of public figures. But of what use is a rule if it is not enforced?

If AI developers design products that are easily misused and then enable (even encourage) their users to do so, it is high time for them to accept responsibility. They are able to establish guardrails; they just don’t want to as it is easier to free ride, while attracting as many users as possible. Midjourney is a good case in point, but Warner Bros has just fired a shot across their bow. Just as Bugs did in Captain Hareblower.

© Hugh Stephens 2025. All Rights Reserved.

AI’s Habit of Information Fabrication (“Hallucination”): Where’s the Human Factor?

An illustration of a cartoonish robot face on a computer screen with the text 'THE WORLD IS FLAT' above it.

Image: Shutterstock (with AI assist)

It is well known that when AI applications can’t respond to a query, instead of admitting they don’t know the answer, they often resort to “making stuff up”—a phenomenon commonly called “hallucination” but which should more accurately be called for what it is, total fabrication. This was one of the legal issues raised by the New York Times in its lawsuit against OpenAI, with the Times complaining, among other things, that false information attributed to the journal by OpenAI’s bot undermined the credibility of Times journalism and diminished its value, leading to trademark dilution. According to a recent article in the Times, the incidence of hallucination is growing, not shrinking, as AI models develop. One would have thought that as the models ingest more material, including huge swathes of copyrighted and curated material such as content from reputable journals like the Times (without permission in most instances), its accuracy would improve. That doesn’t seem to be the case. Given AI’s hit and miss record of accuracy, it should be evident that AI output cannot be trusted or, at the very least, can only be trusted if verified. Not only is AI built on the backs of human creativity (with a potentially disastrous impact on creators unless the proper balance is struck between AI training and development, and the rights of creators to authorize and benefit from the use of their work), but human oversight and judgement is required to make it a useful and reliable tool. AI on auto-pilot can be downright dangerous.

The most recent outrageous example of AI going astray is the publication by the Chicago Sun-Times and Philadelphia Inquirer, both reputable papers (or at least they used to be), of a summer reading list in which only five of fifteen books listed were real. The authors were real but most of the book titles and plots were just made up. Pure bullshit produced by AI. The publishers did a lot of backing and filling, pointing to a freelancer who had produced the insert on behalf of King Features, a unit of Hearst. Believe it or not, it was actually licensed content! That freelancer, reported to be one Marco Buscaglia, a Chicago “writer”, admitted that he had used AI to create the piece and had not checked it. “It’s 100% on me”, he is reported to have said. No kidding. Pathetic. Readers used to have an expectation that when a paper or magazine published a feature recommending something, like a summer reading list, the recommendation represented the intellectual output of someone who had done some research, exercised some judgement, and had presumably even read or at least heard about the books on the list. How could anyone recommend non-existent works? The readers trusted the newspaper, the paper trusted the licensor, the licensor trusted the freelancer, the so-called author. Nobody checked. Where was the human element? The list wasn’t worth the paper it was printed on.

The same problem of irresponsible dependence on unverified information produced by AI is a growing problem in the legal field. Prominent lawyer and blogger Barry Sookman has just published a cautionary tale about the consequences of using hallucinatory AI legal references. Counsel for an applicant in a divorce proceeding in Ontario cited several legal references using the CanLII database (for more information on CanLII see “AI-Scraping Copyright Litigation Comes to Canada (CANLII v Caseway AI) that the presiding judge could not locate—because they did not exist. He suspected the factum had been prepared using Generative AI and threatened to cite the lawyer in question for contempt of court, noting that putting forward fake cases in court filings is an abuse of process, and a waste of the court’s time. The lawyer in question has now confirmed that AI was used by her law clerk, that the citations were unchecked, and has apologized, thus avoiding a contempt citation. Again, nobody checked (until the judge went to the references cited).

This is not even the first case in Canada where legal precedents fabricated by AI were presented to a court. Last year in a child custody case in the BC Supreme Court, the lawyer for the applicant was reprimanded by the presiding judge for presenting false cases as precedents. The fabricated information was discovered by the defence attorneys when they went to check the applicant’s lawyer’s arguments. As a result, the applicant’s lawyer was ordered to personally compensate the defence lawyers for the time they took to track down the truth. The perils of using AI to argue legal cases first came to prominence in the US in 2023 when a New York federal judge fined two lawyers $5000 each for submitting legal briefs written by ChatGPT, which included citations of non-existent court opinions and fake quotes.

Another area fraught with consequences for using unacknowledged AI generated references is academia. The issue extends well beyond undergraduate student essays being researched and written by AI to include graduate students, PhD candidates and professors taking shortcuts. This university library website, in its guide to students on use of AI generated content, notes that LLMs (Large Language Models used in AI) can hallucinate as much as 27% of the time and that factual errors are found in 46% of the output. The solution is pretty simple. When writing a research paper, don’t cite sources that you didn’t consult.

This brings up the question of “you don’t know what you don’t know”. If your critical faculties are so weak as to not be able to detect a fabricated response, you are in trouble. Of course, some hallucinations are easier to spot than others. Some of the checking is to simply verify that a fact stated in an AI response is accurate or that a cited reference actually exists (but then it should be read to determine relevance). In other cases, it may be more subtle, with the judgement and creativity of the human mind being brought into play to detect a hallucination. That requires experience, knowledge, context—all of which may be lacking in the position of a junior clerk or student intern assigned the task of compiling information. This is all the more reason why it is important for those using AI to check sources, and to exercise quality control. Part of the process is to ensure transparency. If AI is used as an assist, that should be disclosed.

At the end of the day, AI depends on human creativity and accurate information produced by humans. Without these inputs, it is nothing. This brings us to the fundamental issue of whether and how copyright protected content should be used in AI training to produce AI generated outputs.

The US Copyright Office has just released a highly anticipated study on the use of copyrighted content in generative AI training. Here is a good summary produced by Roanie Levy for the Copyright Clearance Center. The USCO report is clear in stating that the training process for AI implicates the right of reproduction. That is not in doubt. It then examines fair use arguments under the four factors used in the US. Notably, with respect to the purpose and character of the work used for training, USCO notes that the use of copyrighted content for AI training may not be transformative if the resulting model is used to generate expressive content or potentially reproduce copyrighted expression. It notes that the copying involved in AI training can threaten significant harm to the market for, or value of, copyrighted works especially where a model can produce substantially similar outputs that directly substitute for works used in the training data. This report is not binding on the courts but is a considered and well researched opinion by a key player.

It is interesting to note that the report was released quickly in a pre-publication version on May 9, just a day before the Register of Copyrights (the Head of the Office) Shira Perlmutter was dismissed by the Trump Administration and a day after the Librarian of Congress, Carla Hayden (to whom Perlmutter reports) was fired. Washington is rife with speculation on the causes for, and the legality of, the dismissals. We will no doubt hear more on this. With respect to fair use in general, the study concludes that “making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets…goes beyond established fair use boundaries”. The anti-copyright Electronic Frontier Foundation (EFF), of course, disagrees. (Which probably further validates the USCO’s conclusions).

The USCO study is about infringement, not hallucination or fabrication, yet both stem from the indiscriminate application and use of AI where the human factor is largely ignored and devalued. Human creativity and judgement is needed to set guardrails on both. Transparency as to what content has been used to train an AI model, along with licensing reputable and reliable content for training purposes, are important factors in helping AI to get its outputs right. Not taking an AI output as gospel but applying a degree of diligence, common sense, fact verification or experienced judgement are other important factors in deploying AI as it should be used, as an aide and assist to make human creativity and human directed output more efficient but not as a substitute for thinking or original research. Generative AI must be the servant, not the master. Human creativity and judgement are needed to ensure it stays that way.

© Hugh Stephens, 2025. All Rights Reserved.

How AI Can Destroy Local Journalism

Image: Shutterstock.com

We all know that local journalism is under extreme pressure. Long established regional newspapers are closing or are being turned into little more than franchise operations where a bare bones local newsroom contributes a modicum of local news to a newspaper fleshed out with filler from national wire services or mother publications. The regional titles in Canada owned by PostMedia are a prime example of this phenomenon. Some digital startups have helped fill the gap, but they too are struggling. There seems to be a reluctance on the part of many to pay for news through subscriptions, while small online publications are forced to compete with everyone from Google and Facebook on down for ad dollars.

One of the supposed remedies, at least in Canada, has been to create various funds to support local journalism. There are a range of programs including the government funded Local Journalism Initiative, launched in 2019 to encourage local news production in “news deserts” and underserved communities, administered by News Media Canada (an industry group), tax credits to offset journalist salaries if the organization is a “Qualified Canadian Journalism Organization” (QCJO), and most recently the Google-funded $100 million fund for media outlets. This was established as a result of Google’s deal with the government to get a five year exemption from being designated under the Online News Act. According to recent reports, the Google fund’s annual contribution to a journalist’s salary will be in the range of C$13,000 to C$20,000 depending on the number of applying journalistic enterprises deemed to qualify.

All these Band-aid measures are designed to staunch the loss of print and digital publications, which has created news deserts in many parts of Canada and the US. In a 2023 report from Northwestern University’s School of Journalism, as reported by Forbes, it was estimated that almost 3000 print newspapers out of approximately 9000 in the US had ceased publication since 2005. The average loss of newspapers in 2023 was 2.5 per week. In my home province of British Columbia alone, Global News reported that the number of daily papers dropped from 36 to just 13 in the six years from 2010 to 2016. According to the Local News Research Project housed at Toronto Metropolitan University’s School of Journalism, over 500 local radio, TV, print and online news operations shuttered in 345 communities across Canada since 2008 although during that time, around 200 new local news outlets, many of them exclusively digital offerings, launched in 152 communities. However, just one opened in 2023. The decline of local news in Canada was well documented in a study released in February of this year by the Public Policy Forum, “The Lost Estate”. The study examines a number of possible remedies, including philanthropic engagement, community foundations, better targetting of government support programs and increased government advertising in local media.

The reasons for the decline of local news are many; the rise of social media, the migration of ad revenues to giant online platforms like Google and META, the reluctance of a younger generation of consumers to pay for news (especially digitally provided content), unauthorized password sharing (even by government!), rising costs of print, and so on. Newly launched digital outlets have tried to fill the gap, but they are facing challenges in getting sufficient ad revenue or subscriptions. In Canada, some will qualify for tax credits or funding from Google, but it will depend on whether they employ at least 2 full time journalists. For many of these digital outlets, their principal modus operandi is to aggregate content from other sources, provide a quick rewrite summary and then insert a link to the original source, thus avoiding copyright infringement. There is some but not a lot of original local journalism.

I have no doubt that with the growing use of AI, some of this initial screening is done through use of artificial intelligence. AI may even be being used to create summaries and rewrites. This saves time and money—but unfortunately cuts down on the need for real journalists. The evaluation of AI’s utility in local journalism is typical of its use in many other areas, from screen writing to auditing to medical diagnoses. It can be a useful tool to enhance productivity, but often at a cost somewhere else, such as with respect to employment. Even if an experienced employee can employ AI to enhance what they are working on, AI could eliminate the beginner or training jobs that help develop the required experience to do this. These challenges are not new and not unique to local journalism.

What is new is the use of AI to take the aggregation model followed by many small local online journals a step further and go nation-wide, a development discovered and highlighted by the Neiman Lab. The Neiman Lab is part of the Neiman Foundation for Journalism, established in 1938 at Harvard. It administers what is proclaimed to be the oldest fellowship program in the world for journalists. The Foundation also publishes a quarterly magazine, Neiman Reports, dedicated to a critical examination of journalism and other journalism-related programs. In an excellent piece of investigative sleuthing, the Lab discovered that “Good Daily”, which operates in 47 states and 355 towns and cities across the US, targeting small town America, is run by just one person, Mathew Henderson, armed with an AI program. If this seems hard to believe, read on.

According to Neiman’s investigation, Henderson operates his “media empire” out of New York. He uses an AI bot to scan the news daily in each local market. The AI program curates the most relevant stories, summarizes them, edits and approves the copy, formats it into a newsletter, and publishes it. The same day! Readers in these 355 towns are led to believe that this is a local publication. It has local testimonials, although the same testimonials, slightly tweaked, appear in various editions around the country. The publications also share the same “About” information and the same mission, which is “to make local news more accessible and highlight extraordinary people in our community.” Henderson claims his automated newsletter is actually helping local publications by driving traffic to them. This is the same argument put forward by META when refusing to pay for news content that it uses on its platform to attract and retain viewers (and sell ads).

Henderson’s business model is to sell advertising and solicit readers for donations. The advertising pie is not infinite, so it is obvious that his “local” newsletters are just one more source of competition for local media chasing ad dollars. Just to be clear, there is nothing illegal about what Henderson is doing. Aggregating content and linking to it is not a violation of copyright law. The lack of full disclosure is a bit disconcerting but I am doubtful if there is anything illegal about having a corporate veil. What is problematic is the cannibalistic nature of his business model, enabled by AI.

This kind of operation, fuelled by AI, can only operate if there is local content to aggregate. The Good Day publications contribute nothing, absolutely zero, to content creation. They, like Facebook, are the ultimate cannibalistic free riders. They can continue to operate successfully, free riding on content created by others, so long as those “others” remain in the business of producing content. However, the more successful businesses like Good Day become, the less viable will be the local journalism sector that produces the content its free-riding competitor subsists on. In the end, the result will be an ouroboros. (Great Scrabble word by the way). The AI driven aggregator will in the end devour the very source of its content. That may be a few years down the road, but logically that is what will happen. It’s like eating your seed grain.

Is there a solution? Many remedies have been proposed, but if there is a silver bullet, I don’t know what it is. For many in the media, finding a way to get the big platforms that benefit from media content to contribute financially to journalism is one avenue, but as we have seen in Canada, Australia, and California, the platforms will pull out all the stops to prevent being required to do so, especially META, which has thumbed its nose at Canada and, now, Australia.

Government subsidies, such as Canada’s Local Journalism Initiative, are resisted by many journalists lest the industry become beholden to government handouts. Holding government to account is one of the key functions of the media, the so-called “Fifth Estate”. Can a subsidized media be trusted to do so? On the other hand, without subsidies will there be any viable local media left? Maintaining its independence is one reason why the small Ottawa-based online publication, Blacklock’s Reporter, that is locked in a David and Goliath struggle with the Government of Canada over the government’s abuse of password-sharing, is opposed to initiatives like the Online News Act. (For the record, that is their view, not mine, but it is a position I respect). Potentially another option could be a tweaking of tax laws to encourage businesses to place ads on local media rather than with the giant international platforms, but in the end business will and should be able to spend its ad dollars where it believes it will get the best results.

For every potential remedy to the problem of keeping local journalism alive, there is a potential downside, just as there is with AI generally. For all its potential advantages AI also has the potential to destroy journalism. Good Daily may be the canary in the coalmine, a fully legal but particularly egregious use of AI putting yet another nail in the coffin of local journalism.

© Hugh Stephens, 2025. All Rights Reserved.

Hijacking a Musician’s Identity to Promote AI-generated Music Isn’t Copyright Infringement: It’s Outright Fraud

Image: Shutterstock.com

Early last week there was a flurry of articles, including one on Billboard, reporting on the strange case of Nova Scotia musician Ian Janes. Janes discovered that his Spotify artist profile included music that wasn’t his and which he hadn’t recorded. Someone had apparently created “Musak-like” AI-generated tracks and had added an album, named Street Alone, to Janes’ artist profile as a way of boosting the album’s take-up. How they did this is not clear although they must have hacked the system in some way. Janes had Street Alone removed from his profile, although it reportedly remains up on Spotify but not under his profile.

One assumes if the AI album was surreptitiously posted to Janes’ account, it could also have been added to the profile of other artists. Janes stated that the person or entity doing the posting could actually be getting any royalties the album is earning because payments are normally made through a distributor who could be directing royalty payments, such as they are, to the fake artist. (Not that they are likely to get rich. There is a 1000 track minimum streaming requirement on Spotify before any royalties are paid, and given Spotify’s track record of paying approximately $3.00 per 1000 streams, it takes over 33,000 streams for an artist to earn the princely sum of $100). However, the fake listing could also boost the stream count of the album, raising its profile in the algorithm. In effect, it appears that someone is hijacking the profile of a known artist to promote low grade AI-generated music. Is this legal? No, it’s not, but why not? Does copyright law help?

Billboard reports that Janes’ lawyer lamented it’s not technically a copyright violation unless the music uses Janes’ likeness or his actual compositions. No one has copied his music; they have just claimed he wrote and performed a piece which he had nothing to do with. Another publication (allaboutai.com) claims this case has revealed significant gaps in Canadian copyright laws, which primarily address human-created works, noting that “Current statutes do not fully account for the complexities of AI-generated content”. This is true, as I pointed out recently in a blog post on the issue of the loophole in Canadian copyright practices that allows registration of AI-generated works even though it is generally accepted that such works do not receive copyright protection under Canadian law. Here we may have an example of an anonymously created AI-generated work trying to find a recognized human artist to associate with. While there is a tangential relationship to copyright, this story is not about copying a specific work or illegally generating income from a copy. It is about the droit d’auteur, the integrity of an author’s or artist’s work and their reputation. In this case, the key issue is the potential damage to Janes’ reputation if inferior work is passed off under his name.

However, copyright law (in Canada or elsewhere) does not specifically include the ability to protect one’s identity or name. A name or identity cannot be copyrighted–although a work by that person can be. Nonetheless, there are some forms of legal protection available. A related concept known as the right of publicity (or right of personality) affords some protection. In Canada, this either falls under provincial privacy laws (BC, SK, MB and NL) or common law principles of defamation. In the US there are laws protecting publicity rights in some states. Another way to protect one’s identity and image is to register it as a trademark, although this is usually done by well-known personalities. There are various requirements such as demonstrating a record of marketing products that use the trademark designation. (Michael Jordan shoes, Frida Kahlo dolls, even Fred Perry tennis shorts—remember him?). But in my view Janes should not have to trademark his name and identity, nor should he have to resort to defamation or privacy laws. To me, perhaps simplistically, this case is about fraud, producing a fake product and passing it off as the real thing.

Under the Criminal Code of Canada, fraud is defined as depriving someone of “any property, money, or valuable security or any service…by deceit, falsehood or other fraudulent means…”. Recently I wrote about the scandal of the Norval Morrisseau art fraud. The principal perpetrator, David Voss, was sentenced to five years in jail for the fraud (technically he pleaded guilty to forgery) while his co-accused Gary Lamont was convicted of forgery and defrauding the public. In sentencing, the judge in the case focused as much on the damage to Morrisseau’s legacy and reputation caused by the fraud as to the harm suffered by those who had purchased fake paintings. Surely the same is true of Janes, or any other musician, whose work is tainted by false claims that a shoddy piece of music was produced by them. The fraud per se would have been perpetrated against a Spotify user who thought they were playing Janes’ music, but the real fraud was perpetrated against the artist, their work and their reputation.

There are examples of fraudulent works where the false attribution of author is also a copyright violation, as in the case of the plethora of Chinese knockoffs of J.K. Rowlings’ Harry Potter works. For example, Harry Potter and the Walk-up Leopard Dragon and Harry Potter and the Chinese Porcelain Doll, both published in Chinese but labelled as being written by Rowlings, were fraudulent works, total fakes, but under US law they also violated Rowlings’ copyright so that is what Warner Bros. used to shut them down. They were unauthorized derivative works. But the definition of a derivative work is much narrower in Canada, as explained here, in this post by Carson Law. In the case of Janes, none of his works was copied or infringed, only his identity and name were misused so, as noted by his lawyer, there are apparently no grounds for bringing a copyright infringement case in Canada. But claiming that a random work was produced by someone who had nothing to do with it is clearly misrepresentation and deceit, in short, fraudulent activity.

Will anyone do anything about it? Of course not. It took two decades and a documentary film that fully exposed what was going on to shame the Canadian authorities into doing something about the Morrisseau fraud. It was only through the dogged determination of one detective in the Thunder Bay Police Service (Sgt. Jason Rybek) that action was finally taken.

The conjunction of AI-generated music (which has no copyright protection), online platforms that market tracks from literally millions of artists (it is estimated there are 11 million artists on Spotify), combined with the ingenuity of hackers who have been able to successfully post ersatz music to legitimate Spotify accounts as a way to promote machine generated tracks, has created a perfect storm allowing for these kind of shenanigans. The damage is done to legitimate artists–who have a tough enough time as it is to profile their music– without having to carry the burden of inadvertently promoting someone else’s musical garbage. To willingly assert that your work is the creation of someone else, whether it is music generated by AI, or paintings produced from a sophisticated “paint by numbers” scheme as in the case of the Morrisseau forgeries, is surely outright fraud. The real artist pays the price. And that’s not fair.

© Hugh Stephens, 2025. All Rights Reserved

This blog post was updated to include reference to Spotify’s payouts in the second paragraph.

Writers! Do You Know your Drafts on MS Word are being Scooped by Microsoft to Build its AI Algorithm? But You Can Stop This From Happening (Read On).

Image: Shutterstock

Although I post my blog content on WordPress, I usually use MS Word to draft my content initially. I am used to it, and it is easy to use. Little did I know that, according to the blogsite and forum nixCraft, Microsoft recently (September Privacy update) switched on a feature that allows them to ingest everything you write on Word to help develop their AI Algorithm, called Copilot. The setting is turned on by default in the Privacy settings and must be unchecked manually. Did Microsoft tell you this? Well, kinda, sorta. Microsoft says, “we don’t use your customer data to train Copilot or its AI features unless you provide consent to do so”. Did you provide consent? You no doubt did, unknowingly, when Microsoft updated its Terms of Use, which it does on a regular basis. If you continued to use Office 365, you granted consent.

In the last few days, a pop up has appeared when I am doing something on Word through Office 365.

“Thank you for using Office! We’ve made some updates to the privacy settings to give you more control”.

If you believe that, I have a bridge to sell you.

If you go to Privacy Settings there is a summary blurb on how the Terms of Use were updated on September 30. If you look hard enough you will find this reference;

We added a section on AI services to set out certain restrictions, use of Your Content and requirements associated with the use of the AI services.”

You really should read the full Terms but as Microsoft notes, this will take an hour of your time (ESTIMATED READING TIME: 55 Minutes; 14268 words).

Having waded through it, this I believe is the relevant wording;

b. To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services.

There is nothing here about opting out. You have to go to Privacy settings and do some digging to get to that. By masking these changes to make them appear that your privacy has been strengthened (whereas in fact it is just a content grab), Microsoft has stood things on its head by putting the onus on you, the user, to exercise your privacy rights. If you don’t want your creative work used to help train its AI algorithm, which in the end might compete directly or indirectly with your work, you need to opt out (unless you want to stop using MS Word altogether). Microsoft, however, is not suggesting this as a preferred option, or even letting it be widely known that it exists as an option. In fact, when you go into Settings to opt out you are presented with this little gem; “The Trust Center contains security and privacy settings. These settings help keep your computer safe. We recommend that you do not change these settings”.

But that is exactly what you must do if you want to keep your creative content out of the hands of Microsoft’s AI developers. Here is how to do it, based on instructions from nixCraft.

On a Windows computer, when in a Word file, go to File in the top left-hand corner. There is a drop-down menu. You want to go to Options. On my computer the Options choice does not show up unless you hit the arrow at the bottom of the page for “More”. When you get to Options, go to Trust Center (left side menu), then Trust Centre Settings. Next up is Privacy Options which leads you to Privacy Settings. There is a drop down menu, including Connected Experiences. There is a heading labelled “Experiences that analyze your content“. This box is checked for you. You want to uncheck it. To save the setting you will have to log out of Word and then log back in. (Update: I have just discovered a quicker way to do this. Go File-Options-General (top of list)-Privacy Settings-Connected Experiences-Experiences that analyze your content-Uncheck).

Eliminating this option will come at a price, according to all the “Learn More” button provided by MS, but it is your choice. For my part, I will forgo the bells and whistles for privacy.

The opt out process is not simple and not intuitive, but worth doing, even if only as a matter of principle. Office365, unlike Google Search or Bing, is not free. We pay to use it through an annual subscription. Even the tired old argument that you are providing your data as a sort of payment for “free” use of a platform’s service does not apply in this case. Microsoft needs more data to feed its AI machine and yours will do just fine, thank you very much. Don’t let them get away with it.

© Hugh Stephens, 2025. All Rights Reserved.

Looking Back at 2024: It’s All About AI and Copyright (And a Few Other Things)

Image: Shutterstock

A retrospective on the year now coming to a close is what one expects this time of year, so I will try not to disappoint. However, when I look back at the copyright developments I wrote about in 2024, the dominant issues that jump out are AI, AI and AI. You can’t read or think about copyright without Artificial Intelligence, or to be more correct, Generative Artificial Intelligence (GAI), occupying most of the space despite many other issues on the copyright agenda. The mantra of “AI, AI and AI”, as in “Location, Location and Location” is apt because there are at least three important copyright dimensions related to AI; training of AI models; copyright protection for outputs generated by AI; and infringement of copyright by works created with or by AI. Of the three, the use of copyrighted content for AI training is the most salient.

Last year in my year-ender, I also discussed AI and the numerous lawsuits that were emerging as rightsholders pushed back on having their content vacuumed up by AI developers to train their algorithms. Those lawsuits have only multiplied. At last count, there are more that 30 cases in the US, ranging from big media vs big AI (New York Times v OpenAI/Microsoft) to class action suits brought by artists and authors, as well as litigation in the UK, EU, and now in Canada (see here and here). That is just on the input side.

In terms of output, i.e. whether works produced by an AI can be copyrighted, there are a couple of interesting cases in the US where applications for copyright registration have been refused by the US Copyright Office (USCO) because of a lack of human creativity. A couple of months ago, I discussed two such high profile cases, one brought by Stephen Thaler, and the other by Jason Allen. To date the USCO is not budging, although it is undertaking an extensive study of the issue. Part 1 of its study, on digital replicas, was published in July of this year. The next section on copyrightability is expected to be published in January with the issues of ingestion for training and licensing in Q1 2025.

While the USCO has to date denied applications for copyright registration of AI-generated works, the Canadian copyright office (CIPO-Canadian Intellectual Property Office) has been caught up in a problem of its own making. This is because Canadian copyright registration is granted automatically, so long as tombstone data and the prescribed fee is provided. The work for which registration is sought is not examined. As a result, copyright certificates have been issued to works created by AI, notwithstanding the general presumption that copyright protection is only accorded to human created work (although this is not explicitly stated in the Act). In July a legal challenge was launched against copyright registrant Ankit Sahni, who successfully registered a work with CIPO claiming an AI as co-author. The case was brought by the Canadian Internet Policy and Public Interest Clinic (CIPPIC) at the University of Ottawa, as I wrote about here. (Canadian Copyright Registration and AI-Created Works: It’s Time to Close the Loophole).

While the courts in the US, UK, Canada and elsewhere are grappling with various issues related to AI and copyright, governments are studying the issue.

In Australia, the Select Committee on Adopting Artificial Intelligence issued its final report in November. While the report was wide-ranging, three of its recommendations related to copyright;

engagement with the creative Industry to address unauthorized use of their works by AI developers and tech companies,

transparency in Training Data by requiring AI developers to disclose the use of copyrighted works in training datasets and ensure proper licensing and payment for these works, and

remuneration for AI Outputs, with an appropriate mechanism to be determined through further consultation

These are important principles, but how they will be implemented in practice remains to be determined.

In Canada, a consultation on AI and copyright was launched late in 2023 with submissions to be received by January 15, 2024. The Canadian cultural community put forth three key demands;

No weakening of copyright protection for works currently protected (i.e. no exception for text and data mining to use copyrighted works without authorization to train AI systems)

Copyright must continue to protect only works created by humans (AI generated works should not qualify)

AI developers should be required to be transparent and disclose what works have been ingested as part of the training process (transparency and disclosure).

Submissions to the consultation were published in mid-year but since then there has been no apparent action. Given the current political crisis facing the Trudeau government, none is expected in the near term although the issue will inevitably have to be addressed after the general election in 2025.

While the EU has already established some parameters dealing with use of copyrighted materials for AI training, the new UK Labour government is taking another run at the issue after various proposals in Britain to find a modus vivendi between the AI and content industries under the Tories went nowhere. The current UK discussion paper on Copyright and Artificial Intelligence, which seems excessively tilted in favour of the AI industry, has aroused plenty of controversy. While it says some of the right things, such as proclaiming that one of the objectives of the consultation is to “support…right holders’ control of their content and ability to be remunerated for its use” the thrust of the paper is to find ways to encourage the AI industry to undertake more research in the UK by establishing a more permissive regime with respect to use of copyrighted content. It is based on three self-declared principles; (notice how these things always seem to come in threes?);

Control: Right holders should have control over, and be able to license and seek remuneration for, the use of their content by AI models

Access: AI developers should be able to access and use large volumes of online content to train their models easily, lawfully and without infringing copyright, and

Transparency: The copyright framework should be clear and make sense to its users, with greater transparency about works used to train AI models, and their outputs.

These three objectives then lead to what is clearly the preferred solution;

“A data mining exception which allows right holders to reserve their rights, underpinned by supporting measures on transparency”

Fine in principle, but the devil is always in the detail and the details in this case revolve around transparency (how detailed, what form, what about content already taken?) and, in particular, reservation of rights, aka “opting out”. This is easy to proclaim in principle but difficult to do in practice. British creators are up in arms, led by artists such as Paul McCartney, and supported by the creative industries in the US. The British composer Ed Newton-Rex has penned a brilliant satire explaining how AI development in the UK will work if current proposal is enacted. The problem with an opt-out solution is essentially twofold; it doesn’t deal with content already absorbed by AI developers and it would be cumbersome if not impossible for many rightsholders to use.

Other governments have addressed the issue in different ways. Singapore has taken a very loose approach toward copyright protection, putting its thumb firmly on the scale in favour of AI developers. It is currently considering additional proposals that would strip even more protection from rights-holders, who are pushing back strongly. Japan had been widely and incorrectly reported to have been on the same path, resulting in a welcome clarification this year from the Agency for Cultural Affairs regarding the limits of Japan’s text and data mining (TDM) exception.

While AI dominated the copyright agenda in 2024, there were other issues relating to copyright and copyright industries that I wrote about. The ongoing question of payment for news content by large digital platforms continued to play out in different ways. In Canada, the struggle between the government and US tech giants Google and META was finally “resolved” (after a fashion) at the end of last year. Google agreed to “voluntarily” pay $100 million annually into a fund for Canadian journalism in return for being exempted from the Online News Act (ONA) while META called the government’s bluff by blocking Canadian news providers from its platform thus, in theory, avoiding being subject to the ONA. However, META has a very subjective interpretation as to what is Canadian news content, allowing some news providers to post to it, while many users have found workarounds, as documented by McGill’s Media Ecosystem Observatory. While the CRTC investigated, the issue is still unresolved.

Meanwhile in Australia, it seems that META intends to go down the same road of blocking news, announcing it will not renew the content deals it initially signed with Australian media in response to Australia’s News Media Bargaining Code, the model upon which Canada’s legislation was based. Unlike in Canada, the Australian government is planning a robust response. (More on this in a future blog post). Finally, on the same topic, California (which was threatening to introduce its own version of legislation to require digital platforms to compensate news content providers) emerged with an outcome very similar to that reached in Canada, with Google offering up some funding (although proportionally less than in Canada) while META appears to have walked away.

Controlled Digital Lending (CDL) was another copyright issue finally settled in 2024 (in the US). The Internet Archive, after losing a lawsuit brought against it by a consortium of publishers who argued that the digital copying of their works constituted copyright infringement, notwithstanding the Archive’s theory that they were simply lending a digital version of a legally obtained physical work held by them (or someone else associated with them), lost its appeal. In December, the deadline for further appeals expired, thus effectively ending this saga. Whether Canadian university libraries, some of whom are avid devotees of CDL, will take note remains to be seen.

The issue of circumventing a TPM (“Technological Protection Measure”), commonly referred to as a “digital lock” and often represented by a password allowing access to content behind a paywall, was also front and centre this year in Canada. In the case of Blacklock’s Reporter v Attorney General for Canada, the Federal Court found that an employee of Parks Canada, who shared a single subscription to Blacklock’s with a number of other employees by providing them with the password did not infringe Blacklock’s copyright since the employee did not circumvent (in the meaning of the law) the TPM and the purpose of the sharing was for “research“, which is a specified fair dealing purpose. Blacklock’s is a digital research service that sells access to its content and protects its content with a paywall, as is common for many online content providers, like magazines and newspapers.

Despite the hoo-ha of anti-copyright commentators asserting the Court had found that “digital lock rules do not trump fair dealing“, it was equally clear the Court had ruled that fair dealing does not trump digital locks (TPMs). The Court did not undermine the protection afforded to businesses to protect their content through use of TPMs. Rather, it determined that sharing a licitly obtained password did not constitute circumvention as outlined in the Act, as I explained here. (Fair Dealing, Passwords and Technological Protection Measures (TPMs) in Canada: Federal Court Confirms Fair Dealing Does Not Trump TPMs (Digital Lock Rules). Although the Court did not legitimize circumvention of a TPM for fair dealing purposes, contrary to claims stating the opposite, its acceptance of password sharing is an outcome that legal experts have disagreed with, (as do I for what it is worth). The law is very clear that fair dealing cannot be used as a pretext or a defence against violation of the anti-circumvention provisions of the Copyright Act. The decision now under appeal by Blacklock’s.

Finally, the last copyright point of note for 2024 is that this year marked the bicentenary of the introduction of the first copyright legislation in Canada, in the Assembly of Lower Canada, in 1824. It also marked the centenary of the entry in force of the first truly Canadian Copyright Act on January 1, 1924. This two hundred years of domestic copyright history is worth celebrating. The first legislation was introduced “for the Encouragement of Learning” so that more local school texts would be written and printed. Given the current standoff between the secondary and post-secondary educational establishment and Canadian authors and their copyright collective over license payments for use of copyrighted works in teaching, one wonders whether we have really learned anything about the role copyright plays in our society. (Copyright and Education in Canada: Have We Learned Nothing in the Past Two Centuries? (From the “Encouragement of Learning” to the “Great Education Free Ride”).

Leaving that question with you to ponder, gentle Reader, is probably a good way to end this look back over the past 12 months. Stay tuned for more commentary on copyright developments in 2025.

© Hugh Stephens, 2024. All Rights Reserved.

CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics (But Don’t Panic, Canada)

Image: Pixabay

Last month I highlighted the first AI/Copyright case in Canada to reach the courts, CanLII v CasewayAI. CanLII, (the Canadian Legal Information Institute), a non-profit established in 2001 by the Federation of Law Societies of Canada, sued Caseway AI, a self-described AI-driven legal research service, for copyright infringement and for violating CanLII’s Terms of Use through a massive downloading of 3.5 million files which Caseway allegedly used to populate its AI based services. Now the principal of CasewayAI, Alistair Vigier, through an article (Don’t Scare AI Companies Away, Canada – They’re Building the Future) published in Techcouver, has responded publicly by trotting out many of the tired and specious arguments put forward by the AI industry to justify the unauthorized “taking” of copyrighted content to use in or to train generative AI models. Let’s have a closer look at these arguments.

Vigier opens by referencing another AI/Copyright case in Canada where a consortium of Canadian media companies is suing OpenAI for copyright infringement. He claims this is all based on a misunderstanding of how AI training works, stating that “AI systems like OpenAI rely on publicly available data to learn and improve. This does not equate to stealing content.” Whether data is “publicly available” or not is irrelevant when it comes to determining whether copyright infringement (aka stealing content) is concerned. Books in libraries are publicly available, or so is a book that you purchase in a bookstore, or content on the internet that is not behind a paywall. (It is worth noting that the Canadian media companies also claim that OpenAI circumvented their paywalls to access their content when copying it). But in none of these cases is copying permitted unless the copying falls within a fair dealing exception, which is very precise in its definition. Labelling copied material as “publicly available” is a red herring.

Vigier’s next argument is to equate the ingestion of content by various AI development models with a human being reading a book. We know that humans enhance their knowledge through reading and are thus able, presumably, to better reason based on the content they have absorbed. Vigier says, “This is how AI works. The AI “reads” as much as it can, gets really “smart,” and then explains what it knows when you ask it a question. Like a human learns from reading the news, so does an AI.

Really? A human does not make a copy, not even a temporary copy, of the content although some elements of the content are no doubt retained in the human brain. But AI operates differently. It makes a copy of the content. This should be beyond dispute although the AI industry continues to muddy the waters by claiming that when content is “ingested” it is converted to numeric data and is thus not actually copied. This is a fallacious argument. Just because the form changes, this does not mean there is no reproduction. When you make a digital copy of a book, there is still reproduction even though the digital form is different from the original hard copy version. When a work is converted to data, the content is still represented in the dataset.

Vigier dubiously states, with regard to OpenAI, “OpenAI’s models do not reproduce articles verbatim; they process vast datasets to identify patterns, enabling insights and efficiency.” Apart from the fact that the New York Times in its separate lawsuit in the US has been able to demonstrate that by typing in leads of articles, it can prompt OpenAI to reproduce verbatim the rest of the article (OpenAI claimed that the Times “tricked” the algorithm), copying is copying even if the result of the copying is somewhat different from the original. The Copyright Act is crystal clear on this point. Section 3 (1) of the Act states that, “For the purposes of this Act, copyright, in relation to a work, means the sole right to produce or reproduce the work or any substantial part thereof in any material form whatever…. If copyright protected content is reproduced in its entirety without permission for a commercial purpose (eg for AI training), that is infringement, unless the use qualifies as a fair dealing under Canadian law or fair use in the US.

The issue of whether ingestion of content to train an AI application results in copying (reproduction) has been carefully studied and documented. One of the most thorough examples is a recent SSRN (Social Science Research Network) paper, entitled, “The Heart of the Matter: Copyright, AI Training, and LLMs” with noted scholar Daniel Gervais (a Canadian by the way) of Vanderbilt University as lead author. The article goes into a detailed discussion on how copying of content occurs during AI scraping to build a Large Language Model (LLM), including the stages of tokenization, embedding, leading to reward modelling and reinforcement learning. The section of the article explaining how copying occurs (pp. 1-6) is dense, technical text but the conclusion is clear, “LLMs make copies of the documents on which they are trained, and this copying takes various forms, and as a result, with appropriate prompting, applications that use the LLMs are able to reproduce original works.” A shorter (and earlier) version explaining how the LLM copyright process works can be found in this article (“Heart of the Matter: Demystifying Copying in the Training of LLMs“), produced by the Copyright Clearance Center in the US. It is also worth noting that these explanations refer only to ingestion of text. AI models that train on images and music are even more likely to produce exact or close-to-exact reproductions of some of the works they have been built and trained on.

So much for the misinformation in Vigier’s article. Now to the scare tactics. He says that the recent Canadian media lawsuit against OpenAI sends a negative message to innovators that Canada may not be open to AI development.

If Canada wishes to remain relevant in this (AI) sector, it must balance protecting intellectual property and promoting technological progress.

The fact that there are currently more than 30 lawsuits in the US, including the seminal New York Times v OpenAI case, does not seem to have slowed down the AI companies in the US. In the UK, legislation has been introduced that would, according to British media reports, “ensure that operators of web crawlers (internet bots that copy content to train GAI, generative AI) and GAI firms themselves comply with existing UK copyright law. These amendments would provide creators with crucial transparency regarding how their content is copied and used, ensuring tech firms are held to account in cases of copyright infringement.” There is lots of AI innovation ongoing in Britain.

The Australian Senate Select Committee Report on Adopting AI has recommended, among other findings, that there be mandatory transparency requirements and compensation mechanisms for rightsholders. The EU is already way out in front on this issue. Its new AI Act stipulates that providers of AI generative models will be required to provide a detailed summary of content used for training in a way that allows rightsholders to exercise and enforce their rights under EU law. Even India now has its own version of the US and Canadian media cases against OpenAI. (OpenAI’s defence in part is based on the argument that no copying took place in India because no OpenAI servers are located there!)

If that is what the “competition” is doing, who does Vigier cite as being the jurisdictions most likely to attract innovators away from Canada? Why, it is those AI powerhouses of Switzerland, Dubai—and the Bahamas!

The argument that if legislators and the courts don’t give AI innovators a free pass on helping themselves to copyrighted content for AI training purposes, this will either slow down innovation or chase it elsewhere is a common fearmongering strategy of the AI industry. This is a race-to-the-bottom mentality whereby content industries are thrown under the AI bus. Vigier, having been the subject of his own lawsuit, argues that instead of resorting to litigation, the Canadian media companies should have sought a licensing solution. But the fact that no licensing agreement was reached with OpenAI is undoubtedly the reason for the lawsuit in the first place. That is certainly the reason behind the NYT v OpenAI lawsuit in the US; licensing negotiations broke down. If someone has taken your content without authorization, and then offers you pennies on the dollar in comparison to what that content is actually worth, then the stage for a lawsuit is set.

In explaining CasewayAI’s position in the litigation brought by CanLII, Vigier says that Caseway approached CanLII with an offer to collaborate but was rebuffed. As a result they developed other extensive web crawling technology that pulled the needed material from elsewhere. (Where exactly the material was downloaded from is the crux of the matter). Regardless, this makes it sound as if it was CanLII’s fault for refusing to share their content. Surely a rightsholder has the right to determine the terms on which their content is to be shared with others, if at all.

The fact that Caseway went to CanLII in the first place suggests that CanLII had developed the content that Caseway wanted. Caseway claims the material it accessed was on the public record, such as court documents and decisions. CanLII, on the other hand, claims that it had reviewed, indexed, analyzed, curated and otherwise enhanced the content in question, thus adding a wrapping of copyright protection to what otherwise would be public documents. Who is right, and whether the material was scraped from CanLII’s website without authorization, will be determined by the BC Supreme Court.

If the material taken by CasewayAI was not copyright protected, they are in the clear, at least with respect to copyright infringement. That is quite different, however, from arguing that no copying takes place during AI training or that if rightsholders use the courts to protect their rights, Canada will be a laggard when it comes to AI development. Robust AI development needs to go hand in hand with robust copyright protection for creators, with an appropriate sharing of the spoils of the new wealth generated from the creative work of authors, artists, musicians and other rightsholders. To say, as Vigier does in his concluding paragraph that;

Canada has a choice to make. Will we embrace AI as the transformative force it is, or will we let fear and litigation stifle innovation? The lawsuits against Caseway and OpenAI message tech companies: you’re not welcome here. If this continues, Canada won’t just lose its AI startups; it will lose the future of job creation.

What sheer self-interested nonsense!. This is fearmongering of the worst kind, based on an inaccurate and misinformed knowledge of how AI is developed and trained, that moreover impugns the legitimate right of a rightsholder to seek the protection of the law to protect their creativity and investment in content. Vigier might be correct when he says that licensing of content is a win/win for both parties. I agree with that. But licensing negotiations are about money and conditions of use and require willing parties on both sides. When licensing discussions break down, or when one party decides to do an end run on licensing because they have been rebuffed, then the way to gain clarity is through the courts whose job it is to interpret what the legislation means.

Canada still needs to come to grips with the question of how copyrighted content will interface with AI development. As I noted earlier, both sides in the debate made their cases in the public consultation launched a year ago, but since then there has been no movement in Ottawa. The law could be strengthened to ensure adequate protection of rightsholder interests in an age of AI, resulting in facilitating licensing solutions. In the meantime, misinformation and scare tactics need to be called out for what they are.

Adequate protection for rightsholders does not mean the end of AI innovation or investment in Canada. There is no need for panic. We can walk and chew gum at the same time.

© Hugh Stephens, 2024. All Rights Reserved.