CANLII – Hugh Stephens Blog

Copyright, AI and the Legal Profession (Who Blinked?)

Image: Shutterstock

I wonder what really happened? Maybe we’ll never know. On March 23 it was announced that Caseway AI and CanLII (The Canadian Legal Information Institute) had reached a settlement in the copyright infringement case brought by CanLII against Caseway in 2024. As the saying goes, “Somebody knows something”, but they aren’t saying. The settlement is confidential and both sides are very tight-lipped, although Caseway is willing to riff a bit on social media. The CanLII announcement that each party will move forward independently, and that both consider the matter fully and finally resolved with no further comment, is particularly buttoned-up leading one (the “one” being me) to suspect it was maybe CanLII that blinked, not Caseway. But I could be wrong. There is no announcement that Caseway will be licensing CanLII content, or any hints that money has changed hands. Maybe Caseway agreed to stop what they were doing even though they denied doing it.

The facts of the case are as follows. According to its website, CanLII is “a non-profit organization founded in 2001 by the Federation of Law Societies of Canada on behalf of its 14-member law societies. Its mandate is to provide efficient and open online access to judicial decisions and legislative documents.”

Not only that but,

“CanLII supports members of the legal profession in the performance of their duties while providing the public with permanent open access to laws and legal decisions from all Canadian jurisdictions.”

Caseway AI says it is a company that is applying AI techniques to the legal profession “to make legal knowledge accessible, affordable, and usable for everyone.”

This being the case, you might think that CanLII would be delighted when an AI company like Caseway came along to use CanLII’s “free” resources to develop an AI-based legal platform, which would arguably improve access to legal information on the part of the public, plus simplify the research function for legal firms. You would be wrong. Part of the problem, no doubt, was that the AI company, Caseway, charges for its services while not being part of the profession.(i.e. take but not give).

Caseway’s sales pitch also might not endear it to the legal profession;

“We believe the justice system should not feel closed off to those without deep pockets or institutional power…By combining trusted legal sources with modern technology, Caseway levels the playing field—empowering solo lawyers, small firms, businesses, and individuals navigating legal challenges on their own…”

Oh oh. The self representation bogey. Maybe the real reason for CanLII’s suit was that Caseway AI and others like it were setting themselves up as a direct threat to the legal profession. Apart from the threat of more self representation, AI is a two-edged sword for many lawyers. Yes, it simplifies a number of routine duties and research functions, but at the end of the day it could also result in a lot fewer lawyers. The threat is no different than the threat posed to accountants, radiologists, stock market analysts and soothsayers, but needs to be taken seriously.

The nub of the CanLII case was that while it provides public, non-copyrightable judicial decisions, these public documents are compiled in a proprietary database. CanLII argued that it spends considerable time, effort and money to “review, analyze, curate, aggregate, catalogue, annotate, index and otherwise enhance the data” prior to publication and that this creative effort converts public information into copyright protected content. CanLII might be right, based on the US case of Thomson Reuters v Ross where a US court found that Ross Intelligence, an AI research firm, had infringed on the copyrighted legal materials, indexing system and case headnotes (summaries of judicial cases) of Westlaw, a legal research platform owned by Thomson Reuters. Notably Ross had tried to license the Westlaw content, but Thomson Reuters had refused, viewing Ross as a competitor to Westlaw. Ross then helped itself to the material. In both the CanLII and Reuters/Ross cases, the foundational content, (judicial decisions) were in the public domain, but the issue revolved around the secondary, interpretive materials and processes. In presenting its defence, Caseway did not argue that it was entitled to use CanLII’s content under fair dealing or because it was in the public domain. Instead, it argued that it didn’t access CanLII’s content at all. It got its content from other public sources. CanLII had to prove the contrary.

When I asked Google’s AI mode “How strong was the CanLII case against Caseway” I got a summary of various Canadian Lawyer Magazine articles which discussed the pros and cons of the case, and an unsubstantiated assertion that “Caseway agreed to respect CanLII’s terms of service and cease any unauthorized automated data extraction.” Whether that is true or not I cannot say, but it is clear that both CanLII and Caseway will continue on their respective paths. Indeed, Caseway has just burnished its image a bit by cutting a deal with UBC (University of British Columbia, in Vancouver) to research ways to improve the accuracy of AI legal research tools. This is an ongoing problem for legal researchers and more than one lawyer has been sanctioned by the courts for presenting supposed legal precedents that were in fact non-existent, having been hallucinated by AI.

Apart from the AI hallucination problem, which does not limit itself to the legal profession (Deloitte Consulting being a prominent example of a major company being caught with its hand in the AI error-ridden cookie jar, without disclosure to the client), there is also the question of whether an AI platform should be allowed to provide legal advice. It is not licensed to do so and as a regulated profession, lawyers are jealous of their prerogatives. The profession is regulated for good reasons; to ensure competence and integrity to protect clients and the public. There are strict regulations against unlicensed practitioners providing legal advice, with severe penalties. In March of this year, ChatGPT’s parent company, OpenAI, was sued for engaging in the unauthorized practice of law, in this case by providing legal advice through a consumer‑facing chatbot. The seriousness of unlicensed persons or entities providing legal advice explains the many warnings posted on websites and blogs when discussing legal issues. “The foregoing does not constitute legal advice”. The case is pending.

Back to Caseway AI. Do I think that if you have a legal problem, you can solve it with a $49.99 a month subscription to Caseway instead of engaging a lawyer? Well, if you are determined to self-represent, it might be better to try it out rather than heading to the library to borrow a copy of the Highways Act, or Criminal Code, or searching for legal precedents that might be relevant to your case. On the other hand, remember the old saying, often attributed to Abraham Lincoln, that “A man who is his own lawyer has a fool for a client”.

The above does not constitute legal advice. 😊

CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics (But Don’t Panic, Canada)

Image: Pixabay

Last month I highlighted the first AI/Copyright case in Canada to reach the courts, CanLII v CasewayAI. CanLII, (the Canadian Legal Information Institute), a non-profit established in 2001 by the Federation of Law Societies of Canada, sued Caseway AI, a self-described AI-driven legal research service, for copyright infringement and for violating CanLII’s Terms of Use through a massive downloading of 3.5 million files which Caseway allegedly used to populate its AI based services. Now the principal of CasewayAI, Alistair Vigier, through an article (Don’t Scare AI Companies Away, Canada – They’re Building the Future) published in Techcouver, has responded publicly by trotting out many of the tired and specious arguments put forward by the AI industry to justify the unauthorized “taking” of copyrighted content to use in or to train generative AI models. Let’s have a closer look at these arguments.

Vigier opens by referencing another AI/Copyright case in Canada where a consortium of Canadian media companies is suing OpenAI for copyright infringement. He claims this is all based on a misunderstanding of how AI training works, stating that “AI systems like OpenAI rely on publicly available data to learn and improve. This does not equate to stealing content.” Whether data is “publicly available” or not is irrelevant when it comes to determining whether copyright infringement (aka stealing content) is concerned. Books in libraries are publicly available, or so is a book that you purchase in a bookstore, or content on the internet that is not behind a paywall. (It is worth noting that the Canadian media companies also claim that OpenAI circumvented their paywalls to access their content when copying it). But in none of these cases is copying permitted unless the copying falls within a fair dealing exception, which is very precise in its definition. Labelling copied material as “publicly available” is a red herring.

Vigier’s next argument is to equate the ingestion of content by various AI development models with a human being reading a book. We know that humans enhance their knowledge through reading and are thus able, presumably, to better reason based on the content they have absorbed. Vigier says, “This is how AI works. The AI “reads” as much as it can, gets really “smart,” and then explains what it knows when you ask it a question. Like a human learns from reading the news, so does an AI.”

Really? A human does not make a copy, not even a temporary copy, of the content although some elements of the content are no doubt retained in the human brain. But AI operates differently. It makes a copy of the content. This should be beyond dispute although the AI industry continues to muddy the waters by claiming that when content is “ingested” it is converted to numeric data and is thus not actually copied. This is a fallacious argument. Just because the form changes, this does not mean there is no reproduction. When you make a digital copy of a book, there is still reproduction even though the digital form is different from the original hard copy version. When a work is converted to data, the content is still represented in the dataset.

Vigier dubiously states, with regard to OpenAI, “OpenAI’s models do not reproduce articles verbatim; they process vast datasets to identify patterns, enabling insights and efficiency.” Apart from the fact that the New York Times in its separate lawsuit in the US has been able to demonstrate that by typing in leads of articles, it can prompt OpenAI to reproduce verbatim the rest of the article (OpenAI claimed that the Times “tricked” the algorithm), copying is copying even if the result of the copying is somewhat different from the original. The Copyright Act is crystal clear on this point. Section 3 (1) of the Act states that, “For the purposes of this Act, copyright, in relation to a work, means the sole right to produce or reproduce the work or any substantial part thereof in any material form whatever…“. If copyright protected content is reproduced in its entirety without permission for a commercial purpose (eg for AI training), that is infringement, unless the use qualifies as a fair dealing under Canadian law or fair use in the US.

The issue of whether ingestion of content to train an AI application results in copying (reproduction) has been carefully studied and documented. One of the most thorough examples is a recent SSRN (Social Science Research Network) paper, entitled, “The Heart of the Matter: Copyright, AI Training, and LLMs” with noted scholar Daniel Gervais (a Canadian by the way) of Vanderbilt University as lead author. The article goes into a detailed discussion on how copying of content occurs during AI scraping to build a Large Language Model (LLM), including the stages of tokenization, embedding, leading to reward modelling and reinforcement learning. The section of the article explaining how copying occurs (pp. 1-6) is dense, technical text but the conclusion is clear, “LLMs make copies of the documents on which they are trained, and this copying takes various forms, and as a result, with appropriate prompting, applications that use the LLMs are able to reproduce original works.” A shorter (and earlier) version explaining how the LLM copyright process works can be found in this article (“Heart of the Matter: Demystifying Copying in the Training of LLMs“), produced by the Copyright Clearance Center in the US. It is also worth noting that these explanations refer only to ingestion of text. AI models that train on images and music are even more likely to produce exact or close-to-exact reproductions of some of the works they have been built and trained on.

So much for the misinformation in Vigier’s article. Now to the scare tactics. He says that the recent Canadian media lawsuit against OpenAI sends a negative message to innovators that Canada may not be open to AI development.

“If Canada wishes to remain relevant in this (AI) sector, it must balance protecting intellectual property and promoting technological progress.”

The fact that there are currently more than 30 lawsuits in the US, including the seminal New York Times v OpenAI case, does not seem to have slowed down the AI companies in the US. In the UK, legislation has been introduced that would, according to British media reports, “ensure that operators of web crawlers (internet bots that copy content to train GAI, generative AI) and GAI firms themselves comply with existing UK copyright law. These amendments would provide creators with crucial transparency regarding how their content is copied and used, ensuring tech firms are held to account in cases of copyright infringement.” There is lots of AI innovation ongoing in Britain.

The Australian Senate Select Committee Report on Adopting AI has recommended, among other findings, that there be mandatory transparency requirements and compensation mechanisms for rightsholders. The EU is already way out in front on this issue. Its new AI Act stipulates that providers of AI generative models will be required to provide a detailed summary of content used for training in a way that allows rightsholders to exercise and enforce their rights under EU law. Even India now has its own version of the US and Canadian media cases against OpenAI. (OpenAI’s defence in part is based on the argument that no copying took place in India because no OpenAI servers are located there!)

If that is what the “competition” is doing, who does Vigier cite as being the jurisdictions most likely to attract innovators away from Canada? Why, it is those AI powerhouses of Switzerland, Dubai—and the Bahamas!

The argument that if legislators and the courts don’t give AI innovators a free pass on helping themselves to copyrighted content for AI training purposes, this will either slow down innovation or chase it elsewhere is a common fearmongering strategy of the AI industry. This is a race-to-the-bottom mentality whereby content industries are thrown under the AI bus. Vigier, having been the subject of his own lawsuit, argues that instead of resorting to litigation, the Canadian media companies should have sought a licensing solution. But the fact that no licensing agreement was reached with OpenAI is undoubtedly the reason for the lawsuit in the first place. That is certainly the reason behind the NYT v OpenAI lawsuit in the US; licensing negotiations broke down. If someone has taken your content without authorization, and then offers you pennies on the dollar in comparison to what that content is actually worth, then the stage for a lawsuit is set.

In explaining CasewayAI’s position in the litigation brought by CanLII, Vigier says that Caseway approached CanLII with an offer to collaborate but was rebuffed. As a result they developed other extensive web crawling technology that pulled the needed material from elsewhere. (Where exactly the material was downloaded from is the crux of the matter). Regardless, this makes it sound as if it was CanLII’s fault for refusing to share their content. Surely a rightsholder has the right to determine the terms on which their content is to be shared with others, if at all.

The fact that Caseway went to CanLII in the first place suggests that CanLII had developed the content that Caseway wanted. Caseway claims the material it accessed was on the public record, such as court documents and decisions. CanLII, on the other hand, claims that it had reviewed, indexed, analyzed, curated and otherwise enhanced the content in question, thus adding a wrapping of copyright protection to what otherwise would be public documents. Who is right, and whether the material was scraped from CanLII’s website without authorization, will be determined by the BC Supreme Court.

If the material taken by CasewayAI was not copyright protected, they are in the clear, at least with respect to copyright infringement. That is quite different, however, from arguing that no copying takes place during AI training or that if rightsholders use the courts to protect their rights, Canada will be a laggard when it comes to AI development. Robust AI development needs to go hand in hand with robust copyright protection for creators, with an appropriate sharing of the spoils of the new wealth generated from the creative work of authors, artists, musicians and other rightsholders. To say, as Vigier does in his concluding paragraph that;

“Canada has a choice to make. Will we embrace AI as the transformative force it is, or will we let fear and litigation stifle innovation? The lawsuits against Caseway and OpenAI message tech companies: you’re not welcome here. If this continues, Canada won’t just lose its AI startups; it will lose the future of job creation.”

What sheer self-interested nonsense!. This is fearmongering of the worst kind, based on an inaccurate and misinformed knowledge of how AI is developed and trained, that moreover impugns the legitimate right of a rightsholder to seek the protection of the law to protect their creativity and investment in content. Vigier might be correct when he says that licensing of content is a win/win for both parties. I agree with that. But licensing negotiations are about money and conditions of use and require willing parties on both sides. When licensing discussions break down, or when one party decides to do an end run on licensing because they have been rebuffed, then the way to gain clarity is through the courts whose job it is to interpret what the legislation means.

Canada still needs to come to grips with the question of how copyrighted content will interface with AI development. As I noted earlier, both sides in the debate made their cases in the public consultation launched a year ago, but since then there has been no movement in Ottawa. The law could be strengthened to ensure adequate protection of rightsholder interests in an age of AI, resulting in facilitating licensing solutions. In the meantime, misinformation and scare tactics need to be called out for what they are.

Adequate protection for rightsholders does not mean the end of AI innovation or investment in Canada. There is no need for panic. We can walk and chew gum at the same time.

AI-Scraping Copyright Litigation Comes to Canada (CANLII v Caseway AI)

Image: Shutterstock (with AI assist)

It was inevitable. After all the lawsuits in the US (and some in the UK) pitting various copyright holders against AI development companies alleging the AI platforms were infringing copyright by reproducing and ingesting copyrighted materials without authorization to train their algorithms to produce outputs based on the ingested content–outputs that in some cases compete directly with the original work—AI scraping litigation has finally come to Canada. As reported by the CBC, CanLII (the Canadian Legal Information Institute), a non-profit established in 2001 by the Federation of Law Societies of Canada “to provide efficient and open online access to judicial decisions and legislative documents” is suing Caseway AI, a self described AI-driven legal research service, for copyright infringement and for violating CanLII’s Terms of Use through a massive downloading of 3.5 million files.

In its civil claim brought before the Supreme Court of British Columbia, CanLII alleges that the defendants, doing business as Caseway AI, violated its Terms of Use that prohibit bulk or systematic download of CanLII material and that in doing so, the defendants also engaged in copyright infringement by reproducing, publishing and creating a derivative work based on the copied works for the defendants own commercial purposes. There is no question that Caseway is providing legal material for commercial gain. Caseway’s services start at $49.99 a month , or $499.99 a year, and offer an AI driven service that “leverages advanced AI to find relevant case law in less than a minute… Designed with a user-friendly chatbot interface powered by proprietary technology, Caseway (is) a robust tool tailored specifically for the legal profession.” Caseway’s Terms of Service have all sorts of disclaimers, however.

In his defence, Caseway’s Canadian principal (and defendant) Alistair Vigier is reported to have said that “court documents are public record, not owned by any organization, including CanLII. Numerous other websites also make these decisions available.” It is true that court documents and decisions are public documents not subject to copyright protection. However, CanLII claims that its database contains more than just the court’s decisions. It says in its claim that it spends significant time to “review, analyze, curate, aggregate, catalogue, annotate, index and otherwise enhance the data” prior to publication. It is this creative effort that turns public documents into a copyright protected document (or so the argument goes). To use another copyright analogy, you cannot copyright a recipe (a “list of ingredients”) but we all know that cookbooks containing recipes are always copyrighted. This is because of the display and illustrations of the recipes, the layout, commentary and other editorial touches. Julia Child’s sole amandine recipe is not just any old recipe for fried sole. Is CanLII’s compilation of “judicial ingredients” protectable? We will have to wait to find out.

CanLII’s case is reminiscent of a similar case in the US, Thomson Reuters v Ross Intelligence. Thomson Reuters operates a subscription-based legal research service called Westlaw. One of Westlaw’s employees allegedly copied Westlaw content to enable Ross Intelligence to build a machine learning platform that competed with Westlaw. Part of Ross’ defence was that the judicial decisions themselves are public domain documents, so there could be no infringement. Westlaw maintained that its case head notes, summaries that described the cases, were copyrightable material. Ross also brought forward a fair use defence arguing transformation, i.e. they had produced something new and different that did not compete directly with Westlaw’s product. Here is a good summary of the case. The court determined that Ross had copied the headnotes but the copyrightability of Westlaw’s numbering system and headnotes needed to go to a jury to determine. While Ross’ anti-trust case against Westlaw has been dismissed, the copyright case is still pending.

Another case that has been cited as a possible precedent is the famous 2004 CCH Canadian Ltd v Law Society of Upper Canada case in which the Supreme Court of Canada ruled that copies of CCH materials made by the Law Society library for its members did not infringe CCH’s copyright because the library was exercising the fair dealing research exception on behalf of the individuals requesting the copies. I personally don’t see the relevance of this case (but I am not a lawyer) since the Great Library’s users were copying only relevant parts of certain documents, for a specified fair dealing purpose. In the CanLII case, Caseway has apparently inhaled the full collection of documents and is doing so for a commercial purpose, with the resultant product (although not identical to the original) competing with it. Moreover, since there is no text and data mining exception in Canadian law, the “transformation” defences available to US-based AI companies (i.e transforming the original materials to produce something different) are not applicable in Canada. This will be an interesting one for the lawyers.

What the case demonstrates is a crying need for some legislative guidance on the question of AI scraping of copyrighted materials in Canada. It may be that CanLII’s collection cannot be protected by copyright, which would provide Caseway a defence without settling the fundamental issue of whether it is a violation of the Copyright Act to do what Caseway did, assuming the material they used was protectable by copyright. A consultation exercise was launched by the government of Canada (through the Ministry of Innovation, Science and Economic Development, ISED) last October, closing in January with submissions posted in June. Since then, there has been silence on the part of the government. With Parliament at a standstill, and the current government hanging on to power by its fingernails, don’t expect clarity any time soon.

Tag: CANLII

Copyright, AI and the Legal Profession (Who Blinked?)

Like this:

CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics (But Don’t Panic, Canada)

Like this:

AI-Scraping Copyright Litigation Comes to Canada (CANLII v Caseway AI)

Like this:

Share this post:

Like this:

Share this post:

Like this:

Share this post:

Like this: