Artificial Intelligence – Hugh Stephens Blog

An AI Bot Rewrote my Blog Post—And then Gave Me a Failing Grade for Credibility!

A humanoid robot sitting at a desk, using a typewriter while looking at a sheet of paper in a cozy, modern interior with soft lighting. — Image: Shutterstock.com

I don’t know whether to feel offended or flattered, but I’ve been scraped–by AI. And I can prove it. The first intimation I had of this signal occurrence was a notice from WordPress asking me to approve a comment on my recent blog post, “Copyright, AI and the Legal Profession: Who Blinked?”. I logged in to find it wasn’t a comment but rather a link to this website.

A quick click took me to the article “CanLII settling with Caseway signals shift in legal-tech power dynamics”, dated April 20, the same day I had posted my blog. It was under a byline “London News”, which initially I naively assumed referred to London, Ont, (shows how parochial I can be) but quickly realized that this was some kind of online journal for commuters heading toward Picadilly Circus. London News appears to be written by a bot called Noah News Service, managed by the company HBM Advisory, based in London (England). There was no direct reference or link to my blog post in the article, but when I read it, it seemed eerily similar. The words were all different but the thread (with one exception that I will come to later) was the same. When I searched further, I found a footnote indicating the London News story was “inspired by” my blog post. What does this mean in reality?

My original post is protected by copyright, but anyone (even a bot I suppose) can take “inspiration” from a copyrighted work and produce something new. However, the “inspiration” I provided the bot is substantially different, in my view, from the sort of inspiration I would get from reading, say, an Agatha Christie mystery and then deciding to write my own mystery novel. In the case of my blog post, the bot did not really take “inspiration” from the content to create a new original work but rather engaged in rewriting the story using AI analysis of its key points to recreate what I had said using different words. That’s not true inspiration; it’s paraphrasing. Moreover, I’ll wager that an unauthorized copy of my work was made in order to feed the content to the bot to undertake its rewrite. While facts cannot be copyrighted (only someone’s expression of the facts), this rewrite was not based on the facts of the case. It was based on my blog post. Although the bot has not hijacked my precise words (i.e. my expression) it has nevertheless replicated the structure of my work, its flow and its arguments. It’s sailing very close to the wind, but probably still legal. This is not dissimilar to the challenge faced by news organizations who find their expensively created content being scraped and repackaged by online platforms such as Google, META, and others. According to the National Post, in a recent survey commissioned by News Media Canada, more than seven in 10 Canadians (of those surveyed) think the federal government should prevent artificial intelligence companies from taking and repackaging news content without permission or compensation.

But back to the London News article. Scrolling down to the end, I found an analysis of my blog post, produced by Noah. The post was rated according to various categories. It earned a “Freshness Check” score of 8/10 (i.e. the story was relevant), a “Quotes” check of 7/10; a “Source Reliability” score of just 6/10, a “Plausibility” rating of 8/10 but, sadly, an Overall Assessment for credibility of “Fail”, based on a “Medium” degree of confidence in this assessment. OMG, where did I fail to make the bot happy? How did I not meet its standards?

The Source reliability score would have been higher, according to the bot, if it had been published by an “established news organisation”, rather than on a personal blog;

“While the author, Hugh Stephens, has expertise in international copyright issues, (thanks, bot) the blog’s content is not subject to the same scrutiny as mainstream media.”

Well, I can live with that. The whole point of a personal blog is to offer a different perspective from Fox News, the BBC or the Globe and Mail.

The bot’s analysis continued:

“The article references reputable sources, but the lack of direct links to these sources raises concerns about transparency and verifiability.”

In other words, stuff your blog post with direct links to “mainstream media” and you might improve your report card. I could do that, but it might not be appreciated by my readers. The need for more direct links is repeated in the Quotes section (Score: 7/10) as well.

As for my failing grade, the bot’s summary says;

“The article provides a speculative analysis of the CanLII-Caseway AI settlement, referencing reputable sources but lacking direct links for independent verification. Its opinion-based nature and the author’s personal blog platform contribute to concerns about reliability and independence. Given these factors, the content does not meet the standards for publication under our editorial indemnity.”

But they published it anyway, as they do all kinds of content scraped from the web. I am not sure what the editorial indemnity policy is, but I suppose it is some sort of guaranteed reliability indicator, designed to separate the loony conspiracy theories (alternate facts?) from “real news”.

I wondered who would pass the bot’s scrutiny. Of the ten AI related stories posted on the front page of London News on the day I selected, 5 passed, 4 failed, and one was Conditional. The sources were all specialized but non-mainstream tech publications, or informed blogs, but certainly not conspiracy-theory outlets. Yet about half failed to gain Noah’s approval. I started to feel a bit better. Perhaps I’m not such an outlier.

I wonder if could write a blog post that would get an “A” from the bot. First, I would have to catch its attention, which I guess I could do by making sure there were lots of references to “AI” in the text, and then I would have to suppress my instinct to offer views on the topic. I would also have to stuff in lots of links to mainstream sources, like the Guardian and its ilk. But what is the fun in that? And what is the point? If people want to read “just the facts”, they can turn over the screening of content (and thinking) to their mainstream media subscriptions. However, I will say that the idea of assessing the reliability of a story on any topic, whether it’s on AI or the war in the Middle East, is not a bad thing. In the case of HBM, the assessment is used as a teaser to convince users (individuals, but more likely businesses) to sign up for more comprehensive, paid analysis. Part of the problem is that the assessment is done by an AI bot, and we know that AI is far from perfect.

HBM claims it uses AI and statistical modelling blended with human expertise and oversight to do its assessments. There is a thin but cursory layer of human involvement; fact-checking, source verification, style refinement etc. I think this is borne out by one missing key paragraph from HBM’s rewrite of my blog post. I had taken aim at Deloitte as an example of a large multinational company, that should know better, having been caught red-handed using unattributed AI that produced inaccurate, “hallucinated” results in a consulting report it prepared for the Newfoundland government. (“Deloitte’s AI Nightmare: Top Global Firm Caught Using AI-Fabricated Sources to Support its Policy Recommendations”). While HBM’s rewrite included almost all the key points in my post, there was zero reference to Deloitte. I am sure that “human expertise” decided that there was no point in gratuitously antagonizing an actual or potential client. Can I prove it? No, I guess its just another conspiracy theory.

I wonder if this blog post will be picked up and analyzed by Noah and if so, whether I would get a “Pass” this time. After all, it is “Fresh” and I have used lots of quotes from Noah. Having referred to the London News, I should get a 10/10 for Source Reliability (although I am not mainstream media, but neither is Noah). As for Plausibility what could be more plausible than an AI bot ripping off an author’s work through an unauthorized rewrite? Would all that land me a “Pass” from Noah? I will probably never know.

Update: Noah picked up and summarized (using much more direct language this time) the blog post above and then (drumroll) gave me a “Pass”.

Copyright, AI and the Legal Profession (Who Blinked?)

Image: Shutterstock

I wonder what really happened? Maybe we’ll never know. On March 23 it was announced that Caseway AI and CanLII (The Canadian Legal Information Institute) had reached a settlement in the copyright infringement case brought by CanLII against Caseway in 2024. As the saying goes, “Somebody knows something”, but they aren’t saying. The settlement is confidential and both sides are very tight-lipped, although Caseway is willing to riff a bit on social media. The CanLII announcement that each party will move forward independently, and that both consider the matter fully and finally resolved with no further comment, is particularly buttoned-up leading one (the “one” being me) to suspect it was maybe CanLII that blinked, not Caseway. But I could be wrong. There is no announcement that Caseway will be licensing CanLII content, or any hints that money has changed hands. Maybe Caseway agreed to stop what they were doing even though they denied doing it.

The facts of the case are as follows. According to its website, CanLII is “a non-profit organization founded in 2001 by the Federation of Law Societies of Canada on behalf of its 14-member law societies. Its mandate is to provide efficient and open online access to judicial decisions and legislative documents.”

Not only that but,

“CanLII supports members of the legal profession in the performance of their duties while providing the public with permanent open access to laws and legal decisions from all Canadian jurisdictions.”

Caseway AI says it is a company that is applying AI techniques to the legal profession “to make legal knowledge accessible, affordable, and usable for everyone.”

This being the case, you might think that CanLII would be delighted when an AI company like Caseway came along to use CanLII’s “free” resources to develop an AI-based legal platform, which would arguably improve access to legal information on the part of the public, plus simplify the research function for legal firms. You would be wrong. Part of the problem, no doubt, was that the AI company, Caseway, charges for its services while not being part of the profession.(i.e. take but not give).

Caseway’s sales pitch also might not endear it to the legal profession;

“We believe the justice system should not feel closed off to those without deep pockets or institutional power…By combining trusted legal sources with modern technology, Caseway levels the playing field—empowering solo lawyers, small firms, businesses, and individuals navigating legal challenges on their own…”

Oh oh. The self representation bogey. Maybe the real reason for CanLII’s suit was that Caseway AI and others like it were setting themselves up as a direct threat to the legal profession. Apart from the threat of more self representation, AI is a two-edged sword for many lawyers. Yes, it simplifies a number of routine duties and research functions, but at the end of the day it could also result in a lot fewer lawyers. The threat is no different than the threat posed to accountants, radiologists, stock market analysts and soothsayers, but needs to be taken seriously.

The nub of the CanLII case was that while it provides public, non-copyrightable judicial decisions, these public documents are compiled in a proprietary database. CanLII argued that it spends considerable time, effort and money to “review, analyze, curate, aggregate, catalogue, annotate, index and otherwise enhance the data” prior to publication and that this creative effort converts public information into copyright protected content. CanLII might be right, based on the US case of Thomson Reuters v Ross where a US court found that Ross Intelligence, an AI research firm, had infringed on the copyrighted legal materials, indexing system and case headnotes (summaries of judicial cases) of Westlaw, a legal research platform owned by Thomson Reuters. Notably Ross had tried to license the Westlaw content, but Thomson Reuters had refused, viewing Ross as a competitor to Westlaw. Ross then helped itself to the material. In both the CanLII and Reuters/Ross cases, the foundational content, (judicial decisions) were in the public domain, but the issue revolved around the secondary, interpretive materials and processes. In presenting its defence, Caseway did not argue that it was entitled to use CanLII’s content under fair dealing or because it was in the public domain. Instead, it argued that it didn’t access CanLII’s content at all. It got its content from other public sources. CanLII had to prove the contrary.

When I asked Google’s AI mode “How strong was the CanLII case against Caseway” I got a summary of various Canadian Lawyer Magazine articles which discussed the pros and cons of the case, and an unsubstantiated assertion that “Caseway agreed to respect CanLII’s terms of service and cease any unauthorized automated data extraction.” Whether that is true or not I cannot say, but it is clear that both CanLII and Caseway will continue on their respective paths. Indeed, Caseway has just burnished its image a bit by cutting a deal with UBC (University of British Columbia, in Vancouver) to research ways to improve the accuracy of AI legal research tools. This is an ongoing problem for legal researchers and more than one lawyer has been sanctioned by the courts for presenting supposed legal precedents that were in fact non-existent, having been hallucinated by AI.

Apart from the AI hallucination problem, which does not limit itself to the legal profession (Deloitte Consulting being a prominent example of a major company being caught with its hand in the AI error-ridden cookie jar, without disclosure to the client), there is also the question of whether an AI platform should be allowed to provide legal advice. It is not licensed to do so and as a regulated profession, lawyers are jealous of their prerogatives. The profession is regulated for good reasons; to ensure competence and integrity to protect clients and the public. There are strict regulations against unlicensed practitioners providing legal advice, with severe penalties. In March of this year, ChatGPT’s parent company, OpenAI, was sued for engaging in the unauthorized practice of law, in this case by providing legal advice through a consumer‑facing chatbot. The seriousness of unlicensed persons or entities providing legal advice explains the many warnings posted on websites and blogs when discussing legal issues. “The foregoing does not constitute legal advice”. The case is pending.

Back to Caseway AI. Do I think that if you have a legal problem, you can solve it with a $49.99 a month subscription to Caseway instead of engaging a lawyer? Well, if you are determined to self-represent, it might be better to try it out rather than heading to the library to borrow a copy of the Highways Act, or Criminal Code, or searching for legal precedents that might be relevant to your case. On the other hand, remember the old saying, often attributed to Abraham Lincoln, that “A man who is his own lawyer has a fool for a client”.

The above does not constitute legal advice. 😊

Korea’s AI Action Plan: Declaring War on Creators?

A young woman in a sparkling silver outfit poses next to a large robotic figure, adorned with South Korean symbols and colors, in a vibrant city background.

Image: Shutterstock

In the scramble to jump on the global AI bandwagon, Korea has floated a proposal that would supposedly remove “legal uncertainty” for AI developers who use copyrighted content to train AI platforms. Unfortunately, the proposed “solution” threatens to throw Korea’s globally renowned creative sector under the bus. Nor does it remove the uncertainty.

As part of President Lee Jae-Myung’s National Artificial Intelligence Strategy, its Presidential Council has put forward a 98 point “Action Plan”, a blueprint for implementation. There are many aspects to an AI strategy, but a key element is to ensure legal clarity with respect to the use of content for AI training, especially copyrighted content. The Action Plan purports to do this. Its Point 32 proposes the introduction of “explicit exceptions under the Copyright Act to allow copyrighted works to be used without legal uncertainty (emphasis added) in the processes of collecting and analyzing data available on the web”. In other words, introduction of a copyright exception for text and data mining (TDM), subject to certain conditions such as some form of remuneration, transparency, and opt-out features for rightsholders.

If the goal is to remove “legal uncertainty” regarding the use of copyrighted works, this proposal falls short of the mark. No exception can provide 100% certainty given the Berne Convention requirement that any exception meet the so-called “three step test”, meaning that an exception is permitted only in certain special cases, provided that it does not conflict with normal exploitation of the work and does not unreasonably prejudice the author’s legitimate interests. While there will always be a degree of uncertainty regarding exceptions, the good news is there is a ready alternative. The surest way to ensure legal certainty is to encourage licensing of content from rightsholders. The problem with the introduction of a TDM exception—or even the discussion of a possible TDM loophole–is that it diminishes the likelihood of reaching licensing solutions by reducing the pressure on AI developers to open their wallets to reach licensing deals.

It is even more bizarre that the tech industry is pressing for a TDM exception given that Korea is one of the few countries, alongside the United States, that has adopted a fair use provision in its Copyright law. This was done in 2011 as part of the implementation of the Korea-US Free Trade Agreement after heavy lobbying by the tech sector. Fair use allows courts to make case-by-case judgements as to whether a given use meets fair use criteria, thus potentially allowing reproduction of copyrighted material without advance permission from the rightsholder. If free use of copyrighted material for AI training can be shown to be “fair”, why is a TDM exception needed? Even in countries where fair use does not apply (which is most of the world), there is no convincing case or consensus on the need for a TDM exception; there is even less reason for one in a state that has already adopted fair use.

Point 32 of the Action Plan takes note of the existence of fair use in Korea, commenting that the Ministry of Culture, Sports and Tourism is preparing fair use guidelines. These are to provide interpretive guidance on the exemption provisions of the Copyright Act to enable companies to utilize copyrighted data “with greater confidence”. Despite this, the Action Plan claims these guidelines alone are unlikely to fully eliminate uncertainty and judicial risks. Voluntary licensing, however, would eliminate both.

It is well established that AI developers need vast amounts of data to improve the performance of their AI platforms. To date they have largely employed a “take first, ask later” policy. This has led to numerous lawsuits pitting rightsholders against the tech industry, mostly but not exclusively in the US. AI developers in the US have argued that what they are doing amounts to “fair use” because the final AI product is used for a different purpose from the original and thus does not compete with it. That is highly debatable, especially with image and music-based AI works. To date, the results from the US courts have been mixed.

The legality of the tech industry’s unauthorized use of copyrighted content is an issue that a number of countries, in Asia and around the world, are looking at. Various solutions have been proposed to eliminate the uncertainties that arise from leaving the decision to the courts. Among these are TDM exceptions which have been introduced, albeit with strict limitations, in the UK, the EU and Japan. In the UK for example, use is limited to non-commercial purposes. In the EU, it must be accompanied by transparency requirements and opt-out provisions for authors. In Japan, if the unauthorized user derives commercial benefit from the content, the safe harbour does not apply. Australia has explicitly ruled out introducing a TDM exception in order to protect its creative sector, while many others (eg. India, Canada) have no TDM provision in their copyright law. As noted above, the clearest way to remove any uncertainty about the legality of using copyrighted works is to incentivize and recognize voluntary licensing as the solution. This ensures that rightsholders receive appropriate compensation for the work they have put into creating content, while guaranteeing legal certainty for licensees.

The Korean strategy paper argues that AI companies are required to obtain individual consent from each copyright holder “leading to significant costs and time burdens in securing high-quality training data”. But large amounts of high-quality content can be accessed through voluntary licensing agreements with major content creation companies such as studios, publishers, broadcasters, music labels and so on. As for individual authors and artists, one possibility is to look at the model currently used for licensing print and music content through Collective Rights Management Organizations as a supplement to voluntary licenses signed with major rightsholders.

In addition to being instructed to prepare the necessary amendment to the Copyright Law for presentation to the National Assembly by Q2 of this year, the Culture, Sports and Tourism Ministry, in cooperation with the Ministry of Science and Technology, is to “promulgate standard contract templates for the licensing and transfer of copyrights for AI training”. This is the kind of heavy-handed market intervention that is guaranteed to stifle voluntary licensing. Not only that, it amounts to expropriating the rights of Korean creators to manage their works.

The Action Plan gives a nod to the importance of compensating rightsholders and claims it wants to establish a system that respects the rights of creators. However, given the size and importance of Korea’s cultural industries, from film to K-Pop to literature, it is surprising there isn’t greater recognition of what an important strategic and economic asset this sector represents for Korea. Although the Strategy acknowledges that content industries should be able to share the benefits of growth in the AI industry, the proposed solution is unbalanced and biased toward clearing any so-called “obstacles” to unimpeded use of content. As a result, just days after the extremely brief (20 day) consultation period on the Strategy had closed, in mid-January sixteen creator and rightsholder groups issued a strong statement condemning the Action Plan, labelling it “an attempt to fundamentally undermine copyright as a private property right”.

While paying lip service to creator’s rights, the Plan does not address how creators can enforce these rights (other than through the creation of opt-out protocols, which stands the normal copyright procedure of seeking permission prior to usage on its head). The Strategy seems to lead to what has been described by many as a “use now, pay later” system, with little information on how payment would be calculated or implemented. On the other hand, prior, voluntary licensing of content for AI training is a solution that would respect the rights of Korea’s creators while providing the welcome revenue sharing and income stream for which the Strategy advocates. Strong content industries benefit AI development in Korea by encouraging continued creation of the valuable Korean language content so necessary to refine and improve AI models. Conversely, providing the tech industry with an escape hatch to avoid licensing by instituting a TDM exception is the surest way to kill a licensing market for AI content. It will only continue the legal uncertainty that the Presidential Council seems to feel is hindering AI development in Korea.

The one-sided formulation of the Strategy to date has provoked an inevitable negative reaction from Korea’s cultural industries. This is not surprising since the strategy of the tech industry, in Korea and elsewhere, is to avoid dealing with ministries directly responsible for culture and copyright and instead lobby industry, technology and science ministries to bring pressure for changes to copyright law. This adversarial stance is unfortunate as the content and tech industries need and can help each other. The Strategy needs to be amended so rather than throwing the cultural and copyright industries under the bus in the name of facilitating AI development, Korea provides the framework for a mutually beneficial and legally certain relationship. This is best done by upholding longstanding copyright principles and encouraging the growth of a voluntary licensing market for content used in AI training.

Delegating Research to AI is a Risky Proposition: The “Hallucination” Phenomenon (User Beware)

A graphic showing a cartoon robot head on a computer monitor with the text 'The World is Flat Because I Say So' in a playful font.

Image: Shutterstock (author modified)

It seems everyday new applications and new threats emerge from the AI world. This applies in particular to creators who see growing AI challenges to their livelihoods; graphic art and album covers spat out by AI generators; voice actors replaced by AI clones; authors struggling to make their works known in a sea of AI-generated slop; now AI artists are even making the Billboard charts. At the same time, AI has many other functions and produces a host of products that have little to do with artistic creation. In particular, it can be used as a crutch to assist and enable research in a wide range of fields. Today it is routinely used by everyone from school kids to law firms to health care researchers. And that is where the risks of mainlining AI are the most evident, because of the propensity of AI platforms to fabricate plausible sounding misinformation.

In a blog post earlier this year (AI’s Habit of Information Fabrication (“Hallucination”): Where’s the Human Factor?) I discussed some examples of law firms caught submitting non-existent case precedents in court as a result of sloppy legal research using AI. Judges have very limited tolerance for this practice, which wastes valuable court time, and they are increasingly imposing significant penalties—that is, if the fabricated information is actually spotted. The problem is not going away. This website maintained by Paris-based legal scholar Damien Charlotin has compiled a database of more than 550 legal cases in 25 countries where generative AI has produced hallucinated content. These are typically fake citations, but also include other types of AI-generated arguments. The US wins the lottery at 373 cases, but Canada is second with 39. Even Papua-New Guinea has one case.

As you can well imagine, AI hallucinated results in health care could be fatal. As Dr. Peter Bonis, Chief Medical Officer at Wolters Kluwer Health points out, hallucination in the health care field has led to various consequences such as recommending surgery when it was not needed, advising that a specific drug could be safely stopped abruptly when this was known to be dangerous, avoiding recommending vaccinations based on known allergies even though it was safe to do so, proposing wrong starting treatments for patients with rheumatoid arthritis and so on. You get the picture. You don’t want your family doc using AI search for the remedy for whatever ails you. The fact that the models present incorrect information with such confidence, and that potentially dangerous incorrect information is embedded with a lot of correct information, makes proper use of AI outputs particularly challenging.

How is it that AI platforms consistently produce unreliable results? This MIT Sloan article identifies three elements;

Training data sources (the uneven quality of inputs, including pirated, biased and otherwise unreliable content)
Limitations of generative models (generative AI models are designed to predict the next word or sequence based on observed patterns and to generate plausible content, not to verify its accuracy)
Inherent Challenges in AI Design (The technology isn’t designed to differentiate between what’s true and what’s not true)

This is all pretty concerning if people are going to surrender personal judgement to AI and use it to cut corners without verification. One way to address part of the problem is to ensure the training data used is reliable and of high quality. That is where licensing of accurate, curated data and content as training inputs is important and that is why a licensing market is developing as AI companies seek out better quality data to distinguish their product from that of their competitors. This can be very helpful where the AI platform is limited to discrete areas of knowledge, such as in the medical field for example, where usage can be limited to professionals who are prepared to pay for a bespoke AI product and who are qualified to interpret the results properly. AI for the general public is another matter, and this is where most of the problems arise. Unfortunately, while improving the quality of training data helps reduce hallucinations, it does not completely eliminate them. As the New York Times has reported,

“Because the internet is filled with untruthful information, the technology learns to repeat the same untruths. And sometimes the chatbots make things up. They produce new text, combining billions of patterns in unexpected ways. This means even if they learned solely from text that is accurate, they may still generate something that is not.”

User beware. Nonetheless, better inputs lead to better outputs. As AI developers work to take their products to the next level by refining their training processes and making outputs more predictable and trustworthy, they will need access to curated, proprietorial content and closer collaboration with content owners. Dr. Bonis noted that for specialized areas like health care, AI companies will get better quality feedstock while creators of the content will receive funding allowing them to continue research. A virtuous circle.

Users bear a big responsibility to ensure AI is employed effectively. The mindless, unjudgemental use of AI to reach conclusions in areas where the user has little knowledge can be dangerous. By all means use AI as a tool to sort and categorize, but don’t rely on it to produce the answers on which substantive decisions will be based. Any sensible user of AI has a pretty good idea of the answer to the question before it is even asked. It is also a good idea to refine the question, so you narrow the range of possibilities.

Some proprietary AI models offer RAG (Retrieval Augmented Generation) where the AI will retrieve relevant information from trusted sources to supplement its preliminary analysis. This can increase reliability. However RAG, where the AI goes after specific inputs to bolster its results, can also expose AI developers to charges of copyright infringement, as is currently the case with Canadian AI company, Cohere, which is being sued by a number of newspaper publishers, including the Toronto Star, for copyright infringement. As Canadian lawyer Barry Sookman has pointed out in a recent blog, use of RAG can create risk for the AI platform. In the case of Cohere, when its RAG feature was switched on, it reproduced large amounts of almost verbatim text pulled directly from the litigating news sources. But if the RAG function was switched off, it produced fabricated information (hallucinations) yet still identified this false information as coming from an identified reliable news source, leading to charges of trademark dilution. The value of the brand was diminished by the attribution of false information to it. This trademark dilution issue is also part of the New York Times case against OpenAI.

At the end of the day, it is a case of user beware as a recent case in Newfoundland demonstrates well. The Government of Newfoundland commissioned an in-depth study on the future of education in the province. The 410 page report containing over 110 recommendations, authored by two university professors, was released with great fanfare at the end of August. No doubt a great deal of careful research had gone into producing the study over the 18-month production period. But then cracks started appearing in the edifice. It was chock full of made-up citations. The more people started checking, the more they found. The Department of Education and Early Childhood Development tried to whitewash the issue by saying it was aware of a “small number of potential errors in citations” in the report. But even one fabricated citation is one too many! If you search for the report online now you get the classic “404 Not Found” message. A lot of work has potentially gone down the drain, and possibly the credibility of two academics has been destroyed by careless use of AI. This is a cautionary tale that I have no doubt will be repeated.

In fact, it was repeated just a few days later. It seems Newfoundland is particularly prone to victimization by hallucinating AI platforms. After the education report debacle, new reports have surfaced that a $1.5 million study on the health care system conducted by none other than Deloitte also contains fabricated information included made up references. The opposition party is demanding the government insist on a refund.

In our rush to embrace AI, many seem to have forgotten the value of human creativity and judgement. Coming back to the creative industries and AI, some of those whose livelihoods may be threatened by this new phenomenon are bravely trying to find a silver lining. Some voice actors are generating an additional revenue stream by licensing their voice clips for AI training, and many graphic artists use AI as an assist. Are they putting themselves out of work in the long run or are they simply adapting? The jury is still out, but the generally low quality of AI produced art, music and literature, as well as the ongoing problem of hallucination, suggests that there will always be a need for real human input. Anyone planning on substituting AI for “real work” had better think again.

CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics (But Don’t Panic, Canada)

Image: Pixabay

Last month I highlighted the first AI/Copyright case in Canada to reach the courts, CanLII v CasewayAI. CanLII, (the Canadian Legal Information Institute), a non-profit established in 2001 by the Federation of Law Societies of Canada, sued Caseway AI, a self-described AI-driven legal research service, for copyright infringement and for violating CanLII’s Terms of Use through a massive downloading of 3.5 million files which Caseway allegedly used to populate its AI based services. Now the principal of CasewayAI, Alistair Vigier, through an article (Don’t Scare AI Companies Away, Canada – They’re Building the Future) published in Techcouver, has responded publicly by trotting out many of the tired and specious arguments put forward by the AI industry to justify the unauthorized “taking” of copyrighted content to use in or to train generative AI models. Let’s have a closer look at these arguments.

Vigier opens by referencing another AI/Copyright case in Canada where a consortium of Canadian media companies is suing OpenAI for copyright infringement. He claims this is all based on a misunderstanding of how AI training works, stating that “AI systems like OpenAI rely on publicly available data to learn and improve. This does not equate to stealing content.” Whether data is “publicly available” or not is irrelevant when it comes to determining whether copyright infringement (aka stealing content) is concerned. Books in libraries are publicly available, or so is a book that you purchase in a bookstore, or content on the internet that is not behind a paywall. (It is worth noting that the Canadian media companies also claim that OpenAI circumvented their paywalls to access their content when copying it). But in none of these cases is copying permitted unless the copying falls within a fair dealing exception, which is very precise in its definition. Labelling copied material as “publicly available” is a red herring.

Vigier’s next argument is to equate the ingestion of content by various AI development models with a human being reading a book. We know that humans enhance their knowledge through reading and are thus able, presumably, to better reason based on the content they have absorbed. Vigier says, “This is how AI works. The AI “reads” as much as it can, gets really “smart,” and then explains what it knows when you ask it a question. Like a human learns from reading the news, so does an AI.”

Really? A human does not make a copy, not even a temporary copy, of the content although some elements of the content are no doubt retained in the human brain. But AI operates differently. It makes a copy of the content. This should be beyond dispute although the AI industry continues to muddy the waters by claiming that when content is “ingested” it is converted to numeric data and is thus not actually copied. This is a fallacious argument. Just because the form changes, this does not mean there is no reproduction. When you make a digital copy of a book, there is still reproduction even though the digital form is different from the original hard copy version. When a work is converted to data, the content is still represented in the dataset.

Vigier dubiously states, with regard to OpenAI, “OpenAI’s models do not reproduce articles verbatim; they process vast datasets to identify patterns, enabling insights and efficiency.” Apart from the fact that the New York Times in its separate lawsuit in the US has been able to demonstrate that by typing in leads of articles, it can prompt OpenAI to reproduce verbatim the rest of the article (OpenAI claimed that the Times “tricked” the algorithm), copying is copying even if the result of the copying is somewhat different from the original. The Copyright Act is crystal clear on this point. Section 3 (1) of the Act states that, “For the purposes of this Act, copyright, in relation to a work, means the sole right to produce or reproduce the work or any substantial part thereof in any material form whatever…“. If copyright protected content is reproduced in its entirety without permission for a commercial purpose (eg for AI training), that is infringement, unless the use qualifies as a fair dealing under Canadian law or fair use in the US.

The issue of whether ingestion of content to train an AI application results in copying (reproduction) has been carefully studied and documented. One of the most thorough examples is a recent SSRN (Social Science Research Network) paper, entitled, “The Heart of the Matter: Copyright, AI Training, and LLMs” with noted scholar Daniel Gervais (a Canadian by the way) of Vanderbilt University as lead author. The article goes into a detailed discussion on how copying of content occurs during AI scraping to build a Large Language Model (LLM), including the stages of tokenization, embedding, leading to reward modelling and reinforcement learning. The section of the article explaining how copying occurs (pp. 1-6) is dense, technical text but the conclusion is clear, “LLMs make copies of the documents on which they are trained, and this copying takes various forms, and as a result, with appropriate prompting, applications that use the LLMs are able to reproduce original works.” A shorter (and earlier) version explaining how the LLM copyright process works can be found in this article (“Heart of the Matter: Demystifying Copying in the Training of LLMs“), produced by the Copyright Clearance Center in the US. It is also worth noting that these explanations refer only to ingestion of text. AI models that train on images and music are even more likely to produce exact or close-to-exact reproductions of some of the works they have been built and trained on.

So much for the misinformation in Vigier’s article. Now to the scare tactics. He says that the recent Canadian media lawsuit against OpenAI sends a negative message to innovators that Canada may not be open to AI development.

“If Canada wishes to remain relevant in this (AI) sector, it must balance protecting intellectual property and promoting technological progress.”

The fact that there are currently more than 30 lawsuits in the US, including the seminal New York Times v OpenAI case, does not seem to have slowed down the AI companies in the US. In the UK, legislation has been introduced that would, according to British media reports, “ensure that operators of web crawlers (internet bots that copy content to train GAI, generative AI) and GAI firms themselves comply with existing UK copyright law. These amendments would provide creators with crucial transparency regarding how their content is copied and used, ensuring tech firms are held to account in cases of copyright infringement.” There is lots of AI innovation ongoing in Britain.

The Australian Senate Select Committee Report on Adopting AI has recommended, among other findings, that there be mandatory transparency requirements and compensation mechanisms for rightsholders. The EU is already way out in front on this issue. Its new AI Act stipulates that providers of AI generative models will be required to provide a detailed summary of content used for training in a way that allows rightsholders to exercise and enforce their rights under EU law. Even India now has its own version of the US and Canadian media cases against OpenAI. (OpenAI’s defence in part is based on the argument that no copying took place in India because no OpenAI servers are located there!)

If that is what the “competition” is doing, who does Vigier cite as being the jurisdictions most likely to attract innovators away from Canada? Why, it is those AI powerhouses of Switzerland, Dubai—and the Bahamas!

The argument that if legislators and the courts don’t give AI innovators a free pass on helping themselves to copyrighted content for AI training purposes, this will either slow down innovation or chase it elsewhere is a common fearmongering strategy of the AI industry. This is a race-to-the-bottom mentality whereby content industries are thrown under the AI bus. Vigier, having been the subject of his own lawsuit, argues that instead of resorting to litigation, the Canadian media companies should have sought a licensing solution. But the fact that no licensing agreement was reached with OpenAI is undoubtedly the reason for the lawsuit in the first place. That is certainly the reason behind the NYT v OpenAI lawsuit in the US; licensing negotiations broke down. If someone has taken your content without authorization, and then offers you pennies on the dollar in comparison to what that content is actually worth, then the stage for a lawsuit is set.

In explaining CasewayAI’s position in the litigation brought by CanLII, Vigier says that Caseway approached CanLII with an offer to collaborate but was rebuffed. As a result they developed other extensive web crawling technology that pulled the needed material from elsewhere. (Where exactly the material was downloaded from is the crux of the matter). Regardless, this makes it sound as if it was CanLII’s fault for refusing to share their content. Surely a rightsholder has the right to determine the terms on which their content is to be shared with others, if at all.

The fact that Caseway went to CanLII in the first place suggests that CanLII had developed the content that Caseway wanted. Caseway claims the material it accessed was on the public record, such as court documents and decisions. CanLII, on the other hand, claims that it had reviewed, indexed, analyzed, curated and otherwise enhanced the content in question, thus adding a wrapping of copyright protection to what otherwise would be public documents. Who is right, and whether the material was scraped from CanLII’s website without authorization, will be determined by the BC Supreme Court.

If the material taken by CasewayAI was not copyright protected, they are in the clear, at least with respect to copyright infringement. That is quite different, however, from arguing that no copying takes place during AI training or that if rightsholders use the courts to protect their rights, Canada will be a laggard when it comes to AI development. Robust AI development needs to go hand in hand with robust copyright protection for creators, with an appropriate sharing of the spoils of the new wealth generated from the creative work of authors, artists, musicians and other rightsholders. To say, as Vigier does in his concluding paragraph that;

“Canada has a choice to make. Will we embrace AI as the transformative force it is, or will we let fear and litigation stifle innovation? The lawsuits against Caseway and OpenAI message tech companies: you’re not welcome here. If this continues, Canada won’t just lose its AI startups; it will lose the future of job creation.”

What sheer self-interested nonsense!. This is fearmongering of the worst kind, based on an inaccurate and misinformed knowledge of how AI is developed and trained, that moreover impugns the legitimate right of a rightsholder to seek the protection of the law to protect their creativity and investment in content. Vigier might be correct when he says that licensing of content is a win/win for both parties. I agree with that. But licensing negotiations are about money and conditions of use and require willing parties on both sides. When licensing discussions break down, or when one party decides to do an end run on licensing because they have been rebuffed, then the way to gain clarity is through the courts whose job it is to interpret what the legislation means.

Canada still needs to come to grips with the question of how copyrighted content will interface with AI development. As I noted earlier, both sides in the debate made their cases in the public consultation launched a year ago, but since then there has been no movement in Ottawa. The law could be strengthened to ensure adequate protection of rightsholder interests in an age of AI, resulting in facilitating licensing solutions. In the meantime, misinformation and scare tactics need to be called out for what they are.

Adequate protection for rightsholders does not mean the end of AI innovation or investment in Canada. There is no need for panic. We can walk and chew gum at the same time.

Canadian Copyright Registration and AI-Created Works: It’s Time to Close the Loophole

Image: Shutterstock

In July, the Canadian Internet Policy and Public Interest Clinic (CIPPIC) at the University of Ottawa filed an application in the Federal Court to expunge or amend a Canadian copyright registration that claimed an AI program, the RAGHAV AI Painting App, as co-author of a registered work. While the other co-author, an Indian IP lawyer by the name of Ankit Sahni is named as the respondent, the real defendant ought to be the Canadian Intellectual Property Office (CIPO), the organ within the Department of Industry (ISED) responsible for managing copyright registration. It is CIPO’s “rubber stamp, content-blind, absence of judgement” automated system of registration that has led to this situation, putting Canada in a significantly different place from that of the United States or many other countries when it comes to granting copyright protection to works produced by AI algorithms with no or little human intervention.

Last month I wrote a couple of blog posts on the issue of whether content produced with or by generative AI could or should qualify for copyright protection. I looked at the ongoing uphill struggle that two “creators”, Stephen Thaler and Jason Allen, have experienced with the US Copyright Office (USCO) in their attempts to get the USCO to register their works. Thaler claims his submitted work (“A Recent Entrance to Paradise”) was created exclusively by his AI algorithm (the “Creativity Machine”) and, accordingly, it should be recognized as the “author”. However, as the human behind the machine, having invested in creating it, the benefits of the registration should fall to him. He argues that the algorithm carried out the work at his behest, much like a work for hire. Allen, by contrast, claims that although his award-winning work (“Théâtre D’Opéra Spatial) was produced with AI assists, he was the creator through control and manipulation of the prompt process. In neither case has the USCO budged from its position that the works do not qualify for copyright protection on the basis they were not human-created. The same goes for the courts to which the USCO’s rejection has been appealed.

That is the current situation in the US; in Canada it is quite different. Works produced exclusively with AI have been accorded copyright registration, more than once. I have even done it myself! (See “Canadian Copyright Registration for my 100 Percent AI-Generated Work”).

Because Canadian copyright registration is automated and done through a website, an applicant must provide the author’s address, contact details and date of death, if deceased. To work around that, I clearly specified that the work was created entirely by two AI programs (DALL E-2 and CHAT-GPT) with virtually no exercise of “skill and judgement” on my part, this supposedly being the threshold in Canada for creative content that can be afforded copyright protection.

This is how my Canadian copyright certificate No. 1201819, issued April 11, 2023 (I should have tried to register it on April Fools Day), reads in terms of describing the registered work;

“SUNSET SERENITY, BEING AN IMAGE AND POEM ABOUT SUNSET AT AN ONTARIO LAKE CREATED ENTIRELY BY AI PROGRAMS DALL-E2 AND CHATGPT (POEM) ON THE BASIS OF PROMPTS DEMONSTRATING MINIMAL SKILL AND JUDGEMENT ON THE PART OF THE HUMAN AUTHOR CLAIMING COPYRIGHT”.

While this little exercise in inanity was fun, (and was done to expose the failings of the current system), I was not the first to register an AI created work in Canada. That honour, as far as I can tell, belongs to Sahni, the named respondent in the CIPPIC case who, in December 2021, managed to register the artistic work Suryast, listing the AI-powered RAGHAV painting app as co-author. That was a neat way of getting around the requirement to provide an address, contact details etc. Sahni could provide his contact details yet still claim the AI algorithm was an author, even if a co-author. Clever. What Sahni’s motivation was I cannot say, but apparently he has been active in registering the work in as many jurisdictions as he can, maybe to boost the marketability of RAHGHAV. CIPPIC claims he is seeking registration to force various countries to address the AI authorship issue. Canada must have been one of the easiest registrations he received. Now he is being called to account. The application brought by CIPPIC seeks a declaration either that there is no copyright in Sahni’s image, Suryast, or, alternatively, if there is copyright in Suryast, that the Respondent (Sahni) is it sole author. It also seeks an order to expunge the copyright certificate in question or to rectify it by deleting the painting app as a co-author.

The fundamental problem of course is not Sahni or his AI app, (although like me, he may have been mischievous) but rather the way in which copyright registration is offered and maintained in Canada. It was not always this way. Once upon a time, to register a work in Canada you were required to not only pay a registration fee, (which is still the case today) but submit three copies of the work, one for the Copyright Branch (which was part of the Department of Agriculture), one for the Canadian Parliamentary Library and one for the British Museum. Because of these depository requirements, today we have a record of many early copyrighted works in Canada, such as the famous early 20th Century Inuit photographs of Canada’s first professional female photographer, Geraldine Moodie, about whom I wrote a few years ago (“Geraldine Moodie and her Pioneering Photographs: A Piece of Canada’s Copyright History”).

When the first international copyright convention, the Berne Convention of 1886, was established among a limited number of countries, there was a push by authors to abolish the registration requirement because it was burdensome to have to register in all Berne countries. Initially, registration in the home country was supposed to provide protection in all member states of the Convention, but this proved difficult to put into practice. Consequently, in 1908 at the Berlin revision of the Convention, the following provision (which is today part of Article 5(2) was adopted, “The enjoyment and the exercise of these rights shall not be subject to any formality”. Canada was a member of Berne because Britain had acceded, but was nonetheless a reluctant conscript (even though then PM Sir John A. Macdonald had acquiesced to Canada’s inclusion). In 1921 Canada finally passed its own Copyright Act (coming into force in 1924, a century ago this year), and subsequently joined Berne in its own right in 1928. I suspect that registration as a requirement, along with depository and examination conditions, was dropped at that time. That is probably when the current (but non-automated) voluntary registration process was established.

Certainly such a system was in place in the early 1950s when broadcaster Gil Seabrook of Vernon, BC registered an “untitled and unpublished artistic work” entitled “Ogopogo”. The registration of that undocumented work became the source of the urban myth that the City of Vernon owned the intellectual property rights to the mythical lake monster Ogopogo (Seabrook had donated his copyright to the City in an attempt to upstage Vernon’s rival town to the south, Kelowna, that claimed it was the “home of Ogopogo”). As a result, in 2022 Vernon Council went to great lengths to “return” the rights to Ogopogo to the local First Nation as an act of “reconciliation”. Of course, they never had the rights to Ogopogo in the first place. If you want more information, you can read all about it here. (“Copyrighting the Ogopogo: The © Story Behind the News Story”).

Despite the abolition of a registration requirement by Berne Convention countries, Canada is not the only country that maintains one. In the US, which only joined Berne in 1989, both registration and renewal were required for a work to enjoy copyright protection. When the US joined Berne, it maintained the registration requirement for US citizens who wished to take legal action to enforce their copyright. This is allowed under Berne. As such, the US has maintained a robust registration system where a legal deposit of the work is required, registrations are examined and can be challenged or refused.

We know that is not the case in Canada, but Canada is not the only country to have a voluntary registration system. In a recent study by WIPO (World Intellectual Property Organization), some 95 countries were identified as having either a voluntary registration system, a recordation system (for transfer of copyrights) or a legal deposit requirement. What is notable, however, is that of all these countries, only three (Canada, Japan and Madagascar) do not require a deposit of the work seeking registration. Canada does review applications but only to ensure they meet all the formality requirements (name and address of the owner of the copyright; a declaration that the applicant is the author, owner of the copyright or an assignee; the category of the work; its title; name of the author and, if dead, the date of the author’s death, if known. For a published work, the date and place of first publication must be provided and, perhaps most important, payment of the prescribed fee). Nothing else. In fact, if Mr. Mickey Mouse, address Disneyland Way, filed a copyright application for a work and paid the required fee of $63, I am sure a Canadian copyright certificate would be issued. It used to come in the mail, printed on nice quality paper but, alas, in the interests of efficiency, it is now only available in PDF format on CIPO’s website. Print it yourself.

That is the current situation, but why has CIPPIC gone to the Federal Court to dispute the wording of Sahni’s copyright certificate, No. 1188619? While the Registrar of Copyrights can accept requests for correction of a copyright certificate (either because of an error in filing or because the Office itself made a mistake), it cannot by itself amend or remove a registered work from the Register. Instead, the Registrar needs the Federal Court to effect such action. Section 57 of the Copyright Act states, with respect to Rectification of Register by the Court;

(4) The Federal Court may, on application of the Registrar of Copyrights or of any interested person, order the rectification of the Register of Copyrights by
(a) the making of any entry wrongly omitted to be made in the Register,
(b) the expunging of any entry wrongly made in or remaining on the Register, or
(c) the correction of any error or defect in the Register

However , while CIPPIC is seeking expungement of this particular copyright registration, it is the system it is really going after. This is clear from its memorial to the Court;

(23) “In automating its copyright registration process, CIPO is derogating from its obligations to administer copyright in a fair and balanced manner under the Copyright Act.”
(24) “The consequence of this system is that content that does not merit copyright can…easily obtain the benefits of registration.”
(25) “Copyright registrants obtain certain benefits under the Act – such as litigation presumptions – and users and defendants are correspondingly burdened. Once a “work” is registered, the Copyright Act…shifts certain presumptions such as subsistence and ownership….In this very case, as a result of CIPO’s oversight failures, the burden rests on CIPPIC to prove the image Suryast lacks originality and that an AI program cannot be an author.”

Moreover, CIPPIC notes that it brought this case to the attention of CIPO but it refused to correct the Copyright Register, instead encouraging CIPPIC to seek resolution in court. Assuming it is granted standing, CIPPIC may well prevail and have the Suryast registration amended or expunged. But will that really achieve its goals? If its goals are to get CIPO to stop “derogating from its obligations”, then simply cancelling or amending this one registration won’t do it. What is the solution?

One option would be to eliminate the voluntary registration requirement altogether, but is this the right course of action? The WIPO document referenced earlier points out some of the advantages of a voluntary registration system. It can ensure that information about authorship and copyright, including date of registration, become publicly available. This benefits not only authors and rightsholders, who can use the registration as a rebuttable presumption of copyright in court, as in Canada, but also provides information to the public to verify ownership claims and trace title. A voluntary system does not, however, provide a definitive list of what works are under copyright and which are not. Another factor is that an automated voluntary system, such as the one operated by CIPO, is not burdensome for registrants and presents no meaningful obstacle. The problem is that its barriers to registration are so low that it is easy to trick the system. Is a Canadian copyright certificate worth the paper it is printed on if there is no verification?

A second option is to improve the registration process to make it meaningful, but this will require resources. Current fees are low (but the US system which is much more robust has a similar fee structure). Nonetheless, to institute a USCO type system would require substantial additional resources that are unlikely to be forthcoming in the present fiscal environment. One would have to ask whether the extra cost could be justified. It’s a conundrum. Meanwhile, the government has circulated a paper on the issue of Copyright and AI and the Canadian cultural community has weighed in with its views. Prominent among these is the position that copyright protection should be accorded only to human-created works. (This is not currently specified in the Copyright Act).

CIPPIC’s court action puts the spotlight on the current copyright dilemma. The current system seems to be not fit-for-purpose, but an economically viable alternative is not immediately apparent. At the very least, Canada should amend the Copyright Act to prevent AI-created works from obtaining copyright registration.

The Economics of Copyright: Incentives and Rewards (It’s Important to Get them Right)

Image: Shutterstock

Two years ago, in April 2022, the US Copyright Office (USCO) appointed its first Chief Economist, Dr. Brent Lutes. Many national Intellectual Property Offices have such a position, e.g, UK IPO, IP Australia, EUIPO, and WIPO. (Notably, Canada’s Intellectual Property Office–CIPO–does not). All these positions have broad responsibility for assessing the economics of IP generally, covering patent, trademark, industrial designs as well as copyright. In the US, the Patent and Trademark Office has its own Chief Economist. However, Lutes’ USCO position appears to be the only one related exclusively to assessing the economic impact of copyright. The position sits within the Office of Policy and International Affairs and is composed of a small team of economists, providing the Register of Copyrights, Shira Perlmutter, with policy-relevant research on economic issues related to copyright.

In an interview conducted last month, Lutes talked about the economic goals of copyright in terms of enhancing social welfare. He noted the goal of copyright is to contribute to the welfare of society by promoting access to creative works, now and in the future, through market based behavioural incentives. The goal of the Office of Chief Economist is to gather more information to inform policy making, such as the geographic distribution of copyright activity or the demographic characteristics of creators. As but one example, is racial or ethnic diversity related to creativity? The economic issues surrounding AI and copyright, both pro and con, is another field of research the USCO will be exploring.

In addition to finding the right economic levers to stimulate production of creative works, economic studies of copyright also demonstrate the enormous impact copyright-based industries have on national economic welfare. While the impact can depend on what economic multipliers are used and how direct versus indirect benefits are calculated, there is no question that copyright industries in most economies are very significant as job creators and multipliers. For example, IP Australia in its most recent annual report estimates that cultural and creative activity contributes about 6% of Australian GDP annually, with design, fashion, publishing, broadcasting, electronic and digital media and film being the primary industries involved. In the US the figures are even more impressive. According to the International Intellectual Property Alliance, in 2021 (the last year for which statistics are apparently available), core copyright industries in the US, defined as those industries “whose primary purpose is to create, produce, distribute or exhibit copyright materials”, added $1.8 trillion to US GDP, accounting for 7.76% of the economy. Total copyright industries, a definition that includes industries partially dependent on copyright, such as fabric, jewellery or toys and games, account for another trillion USD, even when only a portion of their total value is included in the copyright calculation.

The UK Intellectual Property Office published its IP survey in 2022, comparing the role of patents, trademark, registered industrial designs and copyright. While copyright industries were on the low side for exports (£4.7 billion as opposed to patents at £120.6 billion, copyright’s “non-financial value-added output” (IP data is not available for the financial industries, thus the description of “non-financial”) trounced that of patent industries by almost 2:1. As with the US IIPA study, the UK report accounted for the degree to which certain industries depend on copyright, categorizing them as core, interdependent, partial or non-dedicated support industries, adjusting the amount of copyright contribution accordingly. Book publishing, for example, is considered a 100% copyright industry and its value is calculated as such, whereas for an industry such as paper manufacturing, only 25% of the value was included in the calculation of copyright benefits. This methodology followed that of the World Intellectual Property Organization, aka WIPO, which also conducts economic studies as well as assists national authorities with their own. Economists are careful people, not prone to exaggeration, and consistent methodology is important to ensure accurate measurement and reporting.

WIPO worked with the Department of Canadian Heritage to produce a report in 2020 on “The Economic Impact of Canada’s Copyright-Based Industries”. As with other deep dives on the economic benefits of copyright, this study produced similar notable statistics. For example, while many copyright opponents in Canada were deploring the extension of the copyright term of protection in Canada, arguing that the result would be an outflow of royalties to foreign rights-holders because Canada was a net importer of copyrighted materials, the Heritage report established that “Canada has exported more copyright-related services than it has imported, maintaining a trade balance surplus from 2009 ($2.5 billion) to 2019 ($5.6 billion)”. In actual fact, extending the copyright term in Canada brought with it the additional benefit of a reciprocal extended term in many foreign countries for Canadian works, clearly benefiting Canadian rights-holders. The Heritage study went on to document a range of other important outcomes such as employment (over 600,000), contribution to GDP ($95.6 billion) and percentage of GDP (4.9%). All figures are based on 2019 data. No update has been published since. It is just as well that Heritage Canada took the lead in preparing this report since the government department holding lead statutory responsible for copyright in Canada, the mammoth Department of Industry, Science and Economic Development (ISED), unfortunately seems to treat copyright as but a tiny pimple on its elephantine rump.

While the studies cited above highlight the economic contribution that copyright industries make to national economies in terms of jobs and wealth generation, let us not forget the key point that Dr. Lutes underlined regarding the social welfare contribution of copyright through using market-based incentives to promote and encourage creativity and investment in creative outputs. It is hard, if not impossible, to put a dollar amount on the social welfare benefits of creative expression and cultural sovereignty, but they are immense if incalculable. Without copyright, not only would existing content-based industries be unable to thrive and expand, but the formula to encourage new, original content would be missing.

Notwithstanding the importance of a robust copyright framework for both economic and social welfare, creators and content-based copyright industries are facing major challenges today. Some are technological, like the emergence of generative AI; some behavioural, such as a wide tolerance, even acceptance, of piracy and free riding. The struggle against piracy is ongoing and protracted, a cat and mouse game. Free riding is what AI developers are doing on the backs of content creators through unauthorized training of AI models on copyrighted content, with resultant legal challenges. There is also the question of whether wholly AI generated works should be accorded copyright protection. As the Copyright Alliance has observed, the Copyright Clause in the US Constitution is premised on the promotion of the “progress of science and useful arts” by protecting for a limited period of time the writings and discoveries of authors and inventors. Given that premise, it should be self-evident that creator incentivization is not applicable to machines, which do not need nor comprehend economic incentives to create.

Free riding is also what the education sector has been doing in Canada under the specious umbrella of “education fair dealing”, introduced through copyright amendments in 2012 that broadened the scope of fair dealing. Since then, the “education industry” at the public, secondary and post-secondary level has been siphoning off economic value from writers and other creators to the tune to date of over CAD$200 million. Their legalized renunciation of collective reprographic licensing is ostensibly to benefit students but is in fact a transfer of wealth from creators to the bottom line of educational institutions. If a key objective of copyright is to incentivize creation of new content, such as materials used by educational institutions to teach students, then the current interpretation of education fair dealing in Canada upends a key rationale for granting copyright protection in the first place. (As a footnote, I should add that not all arguments in favour of copyright are based solely on economic incentives. There is also the question of natural justice and equity, providing authors with a degree of control over works they have created).

Since court challenges have unfortunately proven ineffective, the remedy for Canada’s education fair dealing fiasco is for the Government of Canada to amend the Copyright Act so that rightsholders are properly compensated when their works are used in Canada. Both the copyright collective in English Canada, Access Copyright and its Québec counterpart, Copibec, recently called for legal clarification of the nature and extent of educational fair dealing.

Thorough documentation of the contribution that copyright makes to economic and social welfare helps substantiate the case for adequate legal frameworks, including combatting piracy and ending copyright free riding. Sound economic data are essential to sound policy making. The initiative of the US Copyright Office to appoint a Chief Economist helps to meet these goals and is to be commended. Should the Canadian Intellectual Property Office ever create such a position, its first task should be to evaluate the full economic and social costs of the current short-sighted interpretation of fair dealing in Canada’s education sector in terms of its negative long-term impact on creativity and cultural sovereignty in the country.

The Scottish writer Thomas Carlyle may have described economics as the “dismal science”, an oft-quoted remark, but rather than being dismal it is in fact just the opposite; it sheds light on the importance of copyright to maintaining a well-functioning, equitable and culturally rich modern society.

AI’s Copyright Challenges: Searching for an International Consensus

Image: Shutterstock

This has been a busy couple of weeks for national and international declarations on Artificial Intelligence (AI). First the G7 issued its International Code of Conduct for Advanced AI Systems on October 30. The same day US President Biden signed the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, followed by the Bletchley Declaration at the conclusion of the “AI Summit” hosted by UK Prime Minister Rishi Sunak in Bletchley Park, Buckinghamshire, a couple of days later. Meanwhile, the EU’s AI Act is being touted by its sponsor as a potential model for AI legislation in other parts of the world (although its enactment is currently bogged down in the trilogue process between the Commission, EU Council and European Parliament). Notable was the fact that the US Executive Order, a wide-ranging framework document covering many aspects of the AI issue, effectively “scooped” the Brits by a day or so, allowing Vice President Kamala Harris to highlight steps the US had just announced when speaking to the press at Bletchley.

The declarations all addressed many of the concerns surrounding AI, ranging from safety and security, fraud and cybersecurity to privacy, equity and civil rights to protecting consumers, supporting workers and promoting innovation. A key issue only lightly touched on in these declarations, however, was that of AI’s intersection with copyright. This was a missed opportunity to come to grips with a major concern regarding how AI will be able to co-exist with copyright law. (The EU’s draft AI Act includes a transparency requirement to “document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law“, Article 28(b) 4(c).)

AI faces two significant challenges when it comes to copyright protection. First, with respect to the inputs that AI developers use to populate their models to produce generative AI, there is the unresolved question as to whether the free use of copyrighted content violates copyright law by making unauthorized reproductions. There are currently a number of lawsuits underway in the US examining this fundamental question. Many creator groups, such as the News Media Alliance in the United States argue that “the pervasive copying of expressive works to train and fuel generative artificial intelligence systems is copyright infringement and not a fair use”.

Second, with respect to outputs, the work generated by AI has two challenges in terms of obtaining the benefits of copyright protection. If its inputs are infringing, that clearly casts doubt on the legality of the derivative outputs. In addition, there is the problem posed by the current position of the US Copyright Office (and most other copyright authorities) that to be copyright-protected a work must be an original human creation. After the infamous Monkey Selfie case, the USCO issued an interpretive bulletin reiterating the need for human authorship and, to date, it has hewed to this line when examining applications for copyright registration from authors claiming works produced by AI.

The G7 Declaration was broad, covering a wide range of issues related to AI. It included a reference to the copyright issue under Point 11, “Implement appropriate data input measures and protections for personal data and intellectual property”, specifically stating that, “Organizations are encouraged to implement appropriate safeguards, to respect rights related to privacy and intellectual property, including copyright-protected content.” This is hardly prescriptive language, but it is a beginning. I understand that the creative community had to fight hard to get this wording included, but it is at least recognition of the issue.

With respect to the US Administration’s Executive Order, the issue of copyright was also acknowledged, but in a somewhat backhanded way. Section 5.2 (Promoting Innovation), addresses copyright as part of clarifying issues “related to AI and inventorship of patentable subject matter”. Paragraph (c)(iii) declares that the Under Secretary of Commerce for Intellectual Property and Director of the US Patent and Trademark Office shall;

“within 270 days of the date of this order or 180 days after the United States Copyright Office of the Library of Congress publishes its forthcoming AI study that will address copyright issues raised by AI, whichever comes later, consult with the Director of the United States Copyright Office and issue recommendations to the President on potential executive actions relating to copyright and AI. The recommendations shall address any copyright and related issues discussed in the United States Copyright Office’s study, including the scope of protection for works produced using AI and the treatment of copyrighted works in AI training.”

This is not exactly a ringing endorsement of the need for respecting the copyright of those who, willingly or not, provide the raw material for the voracious AI machines that are busy scooping up creator’s content, but it is nonetheless an acknowledgment that there’s an issue that needs addressing.

The US Copyright Office (USCO) launched its study on Artificial Intelligence and Copyright on August 30 of this year “to help assess whether legislative or regulatory steps in this area are warranted”. By the end of October, the USCO had already received more than 10,000 submissions. The comments range from statements by AI developers as to why they shouldn’t be required to pay for copyrighted content used as inputs in developing their models (while of course claiming they should enjoy the benefits of copyright protection for their AI generated outputs), to submissions by creator organizations that argue, among other things, that the ingestion of copyrighted material by AI systems is not categorically fair use and that AI companies should license works they ingest. Licensing their content to AI companies as an additional revenue stream is precisely what major media companies are currently engaged in.

If the US, currently and for the foreseeable future the leading country in development of AI, is thrashing around trying to address this question, one can imagine the process taking place elsewhere. Will the need to set standards inevitably lead to some form of international consensus for the regulation of AI, including the role of copyrighted content? I think it will be essential. Countries that are too lax in protecting their creative sectors will see their copyright-protected cultural industries suffer negative economic consequences; countries that are overly protective of content are worried that investment in AI innovation will flow to countries with lower copyright standards, becoming a race to the bottom for creators.

The UK government has already felt the pinch of this dilemma. In a misguided attempt to gain a head start in the AI development race, about a year and a half ago the British government unveiled a proposal sponsored by the UK Intellectual Property Office (of all entities!) to create an unlimited text and data mining (TDM) exception to copyright, at the same time stripping rights-holders of their ability to license their contact for TDM purposes, or to contract or opt out. In the words of the discussion paper accompanying the draft legislation, in order to reduce the time needed to obtain permission from rightsholders and to eliminate the need to pay license fees;

“The Government has decided to introduce a new copyright and database right exception which allows TDM for any purpose …Rights holders will no longer be able to charge for UK licences for TDM and will not be able to contract or opt-out of the exception.”

This outrageous attempted expropriation of intellectual property rights aroused a storm of protest from the UK’s vibrant cultural sector, a backlash that found resonance in Parliament. As a result, the British government backed off, and withdrew the proposed legislation. However, one wonders if the stake has truly been driven through the heart of this hi-tech gambit or whether, like Dracula, this misguided policy will rise again. “UK Parliamentary Committee Shoots Down Copyright Exemption for AI Developers–But is it Really Dead”? Certainly, British publishers are not convinced the content grab is over. According to the Guardian, they have just issued a statement urging the UK government, “to help end the unfettered, opaque development of artificial intelligence tools that use copyright-protected works with impunity.”

Canada has just launched a public consultation on AI and Copyright, (”Copyright in the Age of Generative Artificial Intelligence”), and others will be doing the same. In Australia, Google, responding to a review of copyright enforcement, urged the government to relax copyright laws to allow artificial intelligence to mine websites for information across the internet (even though this wasn’t the topic of the enquiry). Meanwhile, the Attorney-General’s Department has been conducting several roundtables to explore the issue, the most recent being at the end of August. In that roundtable, representatives of the Australian creative community called for greater transparency around how copyright material is being used by AI developers during the input training and output process.

And so, the search for the right formula goes on. It will not be easy to find the elusive international consensus, especially since at the moment (with the exception of China) this is an issue on the agenda only of the so-called Global North.

How the heavy-hitters will deal with the issue of AI, including its intellectual property dimensions, remains to be seen. There could be something as relatively powerless as OECD Guidelines that emerge or regulation could go a lot further, including the establishment of some kind of international agency with the “authority” to regulate in the area of AI, as suggested by Elon Musk and others. However, as we have seen with every international organization created to date, whether it be the UN, World Trade Organization, International Atomic Energy Agency, or any of the myriad other supra-national structures created in recent years, the authority granted them is only as good as the commitment of its signatory states. It makes sense to harmonize and set broad international standards for the way in which AI is created and used, but it will be a long road to get there.

The challenge of how copyright can intersect with AI–to the mutual benefit of both–has still be worked out. The courts are playing a role, as is technology, evolving business models, and legislation. Society needs to find the sweet spot where both human creation and technological advancement in the form of AI can co-exist for the benefit of society at large. Despite recent pronouncements, the search continues.

This post has been updated to include reference to the ongoing roundtable process underway in Australia under the aegis of the Attorney-General’s Department to explore, inter alia, questions of AI and copyright.

Category: Artificial Intelligence

An AI Bot Rewrote my Blog Post—And then Gave Me a Failing Grade for Credibility!

Like this:

Copyright, AI and the Legal Profession (Who Blinked?)

Like this:

Korea’s AI Action Plan: Declaring War on Creators?

Like this:

Delegating Research to AI is a Risky Proposition: The “Hallucination” Phenomenon (User Beware)

Like this:

CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics (But Don’t Panic, Canada)

Like this:

Canadian Copyright Registration and AI-Created Works: It’s Time to Close the Loophole

Like this:

The Economics of Copyright: Incentives and Rewards (It’s Important to Get them Right)

Like this:

AI’s Copyright Challenges: Searching for an International Consensus

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this: