No Surprise:  Ontario Court Asserts Jurisdiction in Canadian Media Lawsuit Against OpenAI

A judge sitting at a bench in a courtroom, wearing a black robe with a red collar, Canadian flags in the background.

Image: Shutterstock

The Ontario Superior Court has ruled it has jurisdiction to hear the case against ChatGPT owner OpenAI brought by a consortium of Canadian media companies led by the Toronto Star. The media enterprises, who include the Globe and Mail, PostMedia, CBC/Radio Canada, Canadian Press and Metroland Media Group, are suing the US company for copyright infringement, circumvention of technological protection measures (TPMs), breach of contract, and unjust enrichment as a result of OpenAI’s scraping of their websites to obtain content to train its AI algorithm. The allegations also cover OpenAI’s use of Retrieval Augmented Generation (RAG) to produce contemporary search results from paywall-protected content that augment ChatGPT’s AI-generated responses. When the suit was brought in November 2024, OpenAI had challenged the Ontario court’s jurisdiction on the basis, among others, that it had no physical presence in Canada. As pointed out by this legal blog, a court may presumptively assume jurisdiction over a dispute where one of five factors is present:

  • The defendant is domiciled or resident in the province.
  • The defendant carries on business in the province.
  • The tort was committed in the province.
  • A contract connected with the dispute was made in the province.
  • Property related to the asserted claims is located in the province.

The court found that OpenAI carries on business in Ontario notwithstanding its lack of a physical presence and was a party to contracts in Ontario as a result of tacitly accepting the terms of service regarding access to the media companies websites when it scraped them.

OpenAI wanted the venue of the litigation changed to the United States to take advantage of developments in US law regarding unauthorized reproduction of copyright protected content for use as AI training inputs. To date, while many cases are still ongoing, US courts have tended to support a fair use argument by AI developers allowing them to access copyrighted content without permission on the basis that the end use is “transformational”, resulting in a new product that does not compete with the original work. In Canada, the fair use doctrine does not apply and exceptions to copyright protection are either explicitly laid out in the law (e.g. for law enforcement or archival preservation purposes) or are governed by the fair dealing provisions of the Copyright Act. These require that an unauthorized use fall into one of eight categories (research, private study, education, parody, satire, criticism, review and news reporting) that is in turn subject to various court-interpreted criteria such as amount of the work copied, the purpose of the copying, market impact etc. AI developers have been lobbying for the introduction of a text and data mining (TDM) exception into Canadian copyright law, but so far this has been successfully resisted by Canada’s creative community. All this to say that it is more difficult for AI companies to avoid liability for unauthorized use of copyright protected material in Canada than in the US, thus the importance of whether the Ontario court has jurisdiction.

Back in September, on the basis of previous Canadian court rulings where courts ranging from provincial courts to the Supreme Court of Canada asserted jurisdiction over large digital US companies operating virtually in Canada, such as Google (who challenged Canadian legal authority over them on the basis of lack of a physical presence), I predicted (guessed would be a more accurate term) that the Ontario court would be loath to surrender jurisdiction simply because the company was headquartered in the US. The earlier cases were for defamation rather than copyright infringement, and my “prediction” was based more on a hunch than legal analysis, but I am satisfied that I called it right. OpenAI has no compunction about selling services and collecting revenues in Canada and presumably (I hope) pays taxes here, although it is not subject to the Digital Services Tax (DST) that the Carney government threw overboard in a vain attempt to placate Donald Trump. Recall that Trump had threatened to terminate trade talks if Canada proceeded to implement the long-planned DST, so Canada blinked. Trade talks resumed until Trump found another excuse to end the talks, in this case the anti-tariff ads on US television placed and paid for by the Ontario government to which he took offence. But there is no doubt that OpenAI does business here; it just doesn’t want to be subject to Canadian law and Canadian courts. It can’t have it both ways.

While this is a victory for Canadian sovereignty, just because the Ontario Superior Court has confirmed its jurisdiction, this doesn’t mean that once the substantive proceedings begin copyright infringement will be found. Lawyer Barry Sookman, in an analytical  blog post on this topic, has noted that in determining whether the alleged copyright infringements occurred in Canada, “the court relied heavily on the Supreme Court decision in SOCAN for the proposition that the territorial jurisdiction of the CCA (Canadian Copyright Act) extended to where Canada is the country of transmission or reception.” However, “SOCAN applied the real and substantial connection test to the communication to the public right” whereas the alleged copying involved the right of reproduction.

Sookman continues;

“…that test does not apply to the reproduction right. (The Federal Court has) held that the only relevant factor is the location in which copies of a work are fixed into some material form. The locations where source copies reside or acts of copying onto servers located outside of Canada, are not infringements” (according to the cases cited).

Inside baseball information but important when it comes to determining copyright infringement. On the other hand, it seems to me that the infringement involved not just, potentially, the reproduction right (the copying) but also the communication right, because OpenAI, through Microsoft, provided RAG content to users in Canada and elsewhere purloined from behind the paywalls of the media companies. So, we will have to see. Lots of fodder for IP lawyers.

In the meantime, deep-pocketed OpenAI will appeal the jurisdictional ruling—and will likely lose again. The appeal will buy time for it to negotiate licensing deals with the complainants. This is increasingly the model in the US as AI developers, including OpenAI, are reaching licensing agreements with content owners, particularly media organizations. To date, OpenAI has signed licensing deals with the Associated Press, the Atlantic, Financial Times, News Corp, Vox Media, Business Insider, People, and Better Homes & Gardens, among others, while being sued (in addition to Toronto Star et al), by the New York Times and a collection of daily newspapers consisting of the New York Daily News, the Chicago Tribune, the Orlando Sentinel, the Sun Sentinel of Florida, San Jose Mercury News, The Denver Post, the Orange County Register and the St. Paul Pioneer Press. Even META, that arch-opponent of paying for media content–which it claims adds no value to its users– has struck a media deal with news publishers, including USA Today, People, CNN, Fox News, The Daily Caller, Washington Examiner and Le Monde. (One wonders if this will cause it to rethink its position of thumbing its nose at Canada’s Online News Act, where it “complied” with the legislation by blocking all Canadian news links).

In another content area, OpenAI and Disney have just agreed on a three-year output deal, allowing it to use Disney characters (subject to certain limitations) in its AI creations. (Meanwhile Disney is suing Google for using its characters in Google’s AI offering). Open AI is currently facing 20 lawsuits, including the Toronto Star case, and needs to resolve these legal challenges before its expected public offering next year or 2027. The spectre of impending lawsuits will inevitably lower the IPO price.

Most if not all of these lawsuits are going to end in settlements via voluntary licensing agreements, but that will only happen if OpenAI thinks the alternative (losing a major lawsuit) is a worse outcome. If it can wriggle out from the Toronto Star case by invoking some specious argument related to jurisdiction, it will. If it can’t it, will eventually open its chequebook and provide the Canadian media outlets some compensation for the valuable curated content it has hijacked. Canadian courts need to stay the course to help ensure that this happens.

© Hugh Stephens, 2025. All Rights Reserved.

Does OpenAI (ChatGPT) Have a Presence in Canada? Should it be Subject to Canadian Law?

Based on Common Sense, the Answer Should be “Yes”

A hand holding a smartphone displaying the 'Chat GPT' logo in front of a Canadian flag backdrop.

Image: Shutterstock

Late last year a consortium of major Canadian media companies (including the Toronto Star, Globe and Mail, CBC-Radio Canada, Canadian Press, Metroland and PostMedia) sued OpenAI, founders and operators of ChatGPT (and Dalle E), for copyright infringement, seeking injunctive relief and damages. OpenAI moved to dismiss the case on jurisdictional grounds. The Ontario Superior Court is now reviewing that question. As the Globe and Mail reports, OpenAI is trying to argue that the Ontario court has no jurisdiction because the company has no physical presence in Canada. It is headquartered in San Francisco and registered in Delaware.

As I commented in an earlier blog posting on this issue, the fact that the US fair use doctrine does not apply in Canada, combined with the closed nature of fair dealing exceptions and the lack of a Text and Data Mining exception in Canadian law, could prove troublesome for OpenAI. However,  OpenAI would rather defend its case in California where it can resort to US “fair use” arguments, as it is doing in its defence against the copyright infringement and trademark dilution lawsuit brought against it by the New York Times. (The NYT case is being heard in the Southern District of New York). While the interpretation of whether fair use applies to unauthorized use of copyrighted materials for AI training is evolving in the US, and the outcome is far from certain, fair use and so-called “transformative use” have no applicability in Canada.

OpenAI claims that none of its corporate entities named in the suit conducts business in Ontario nor has a physical presence there. It also claims that the alleged conduct (web-crawling and copying) overwhelmingly takes place outside Canada. The lawyers for the plaintiffs concede that OpenAI’s servers are outside Canada but instead focus on other aspects of OpenAI’s conduct and presence. They note the websites of the media companies that were (and are) being crawled by OpenAI are hosted in Canada (which is one reason why the NYT suit is being heard in New York, because the content that OpenAI copied is located in New York City). Microsoft, which is a 49% owner of OpenAI, sells OpenAI’s products and services in Canada and its models are “reproduced and hosted” in a Microsoft data centre in Toronto. The suit alleges that the copyrighted content was copied not just once for AI training but is continuously accessed and reproduced through what is known as “Retrieval Augmented Generation” (RAG) whereby (according to the complaint) OpenAI’s models are “provided continuous access to an additional data set (the “RAG Data”), which is continually updated in response to user prompts.

There is no doubt that OpenAI operates in Canada, offering products to Canadian residents such as ChatGPT subscriptions and accepting payment in Canadian dollars, although it may not be incorporated or have a bricks-and-mortar office. In fact, its student discount offers are pitched specifically as being only for students in the US and Canada. If physical presence in a country is a requirement for the exercise of judicial jurisdiction, it makes me wonder how Elsevier and the American Chemical Society were able to sue Sci-Hub in the US and win substantial damages (which were never paid) given that Sci-Hub had and has no presence in the US. Would Russia or Kazakhstan, which is where its servers are believed to be located, have been the appropriate jurisdictions?

This is no doubt a complex legal question, and we will have to wait to see how the Court rules. In addition to noting the various forms in which OpenAI operates in Canada, the plaintiffs have pointed out that were the Court to surrender jurisdiction, this would amount to giving up the ability to regulate a large part of the digital economy and constitute an affront to Canadian sovereignty, an argument dismissed by OpenAI’s legal team as hyperbolic and sentimental. However—and although this is not based on any legal analysis, which I am not capable of providing– I have a hunch that the sovereignty argument will carry some weight.

In the past, Canadian courts have not shied away from asserting jurisdiction over cases involving Silicon Valley giants, which have been quick to seek transfer of court proceedings to California. I can think of several cases that fit into this category, notably the Equustek case in which the Supreme Court of British Columbia’s decision requiring  Google to de-index certain information from its global search results was upheld by the Supreme Court of Canada after Google had appealed the BC court’s ruling claiming Canada was applying its law extraterritorially. Another was a defamation case in BC where the plaintiff, a resident of both California and British Columbia, sued Twitter in BC for defamation for repeatedly allowing defamatory tweets despite being requested to remove them. Twitter wanted the case moved to California where it could hide behind the notorious Section 230 of the 1996 Communications Decency Act. This legislation has been interpreted by US courts to absolve digital platforms of responsibility for user content they host and disseminate. The BC court refused precisely because under US law the plaintiff would have had no cause of action because of the existence of Section 230. In another case Google tried to invoke the jurisdictional argument, as well as Section 230, in a Quebec defamation case. Google argued the Quebec court had no jurisdiction because its server was located in the US. That argument didn’t fly, nor did Google’s argument that it was protected by Section 230 because of the CUSMA/USMCA trade agreement.

None of these cases is an exact match for the OpenAI case, of course, but I somehow doubt if the Ontario Superior Court is going to let this one go. There have only been a couple of other AI/Copyright cases in Canada along similar lines, CANLII v Caseway AI, where both parties were Canadian entities, and several class action suits brought by authors in British Columbia including a suit against Nvidia by local author J.B. MacKinnon. As far as I am aware, no decision has been reached in any of these suits. New legislation to address unauthorized use of copyrighted content for AI training does not seem to be on the immediate horizon in Parliament so it is left to Canadian courts to establish some guidelines regarding Canadian law in this area. The Toronto Star et al. v OpenAI case would fit this bill perfectly.

© Hugh Stephens, 2025. All Rights Reserved.

Should We Throw Copyright Under the Bus to Compete with China on AI?

An illustration depicting a stick figure running away from a bus labeled 'AI,' while another figure labeled 'C' appears to have been hit or is lying on the ground.

Image: Shutterstock (author modified)

If this sounds about as responsible as “we should legalize theft of patents at home because patent infringement is rife in China”, then you may well ask where such a nonsensical and counterproductive idea came from. From OpenAI, the company behind ChatGPT, for one, the same company being sued by the New York Times for copyright infringement for copying and using NYT content without permission to train its AI algorithms.

Sam Altman, CEO of OpenAI, is one of the “tech bro’s” now cozying up to Donald Trump. He is a vocal advocate of allowing the AI industry unfettered access to copyrighted content as part of the AI training process. Last year, in a submission to the UK Parliament OpenAI claimed that it would be “impossible” to train AI without resort to content protected by copyright. Now, it maintains that allowing AI companies to scoop up copyrighted content without authorization or payment is not only “fair use”, a legally unproven proposition that is currently very much a live issue before the courts in the US and elsewhere, but is essential for “national security”. To cite a few choice tidbits from OpenAI’s submission to the Office of Science and Technology Policy (OSTP) filed in response to the Office’s request for submissions on the Trump Administration’s AI Action Plan;

Applying the fair use doctrine to AI is not only a matter of American competitiveness—it’s a matter of national security… If the PRC’s developers have unfettered access to data and American companies are left without fair use access, the race for AI is effectively over… access to more data from the widest possible range of sources will ensure more access to more powerful innovations that deliver even more knowledge.”

And, one could add, more profit for AI companies.

In other words, if the US government doesn’t give AI companies free and unfettered access to whatever content it desires, regardless of whether it is protected by copyright (think curated news content, musical compositions and artistic works, not to mention the published works of countless authors), then China will win the AI race, threatening the national security of the US. Or so Altman’s argument goes.

The AI industry is already a practitioner of the art of helping themselves to OPC (other peoples’ content) without permission, then claiming fair use when they are caught doing it. That is what has led to the multiplicity of lawsuits now before the courts, brought by various authors and content owners. Raising the bogeyman of China and wrapping themselves in the flag by invoking “national security”, is a new wrinkle in the attempts by the tech industry to undermine established copyright law and to wriggle out from under their legal obligations.

“National security” is a convenient catchphrase and pretext in common use today to try to justify and legalize the unjustifiable and the illegal. Donald Trump invoked national security when he used the International Economic Emergency Powers Act (IEEPA) to override USMCA/CUSMA obligations made to Canada and Mexico, treaty obligations that he himself signed in his first term in office. The immediate excuse was the flow of fentanyl across the northern and southern borders of the US. Never mind that the amount of fentanyl seized by US border agents at the Canadian border came to a grand total of less than 43 lbs. for all of 2024, or just 0.2% of the total. (The equivalent for Mexico was 21,148 lbs). National security, and in particular playing the China card, is a political winner these days in Washington.

OpenAI’s position is all the more outrageous because it went into fits when the Chinese startup, DeepSeek, launched its new and much cheaper product, allegedly having used OpenAI’s capabilities to improve its own model. OpenAI cried foul and IP infringement, a case of blatant hypocrisy if there ever was one.

OpenAI and other generative AI companies that have built their training model on permissionless copying are clearly nervous about the possible outcomes of the numerous court challenges to its practices currently underway. Most of these cases are in the US although similar lawsuits have been launched in the UK, Canada, India and Germany. While it is impossible to predict the outcome of specific cases, in a recent decision (Westlaw v Ross), a US court rejected fair use as a defence in the context of AI training data. It did not accept that copying the content was a transformative use, but rather one that created a product that competed in the market with the original source material. Given the legal uncertainties, it looks like the tech industry is trying to hedge its bets by lobbying to have all AI training uses declared to be “fair use” based on national security considerations.

It gets worse than that. Another of the tech bro’s, Mark Zuckerberg, gave the green light to training of META’s AI model on pirated material. This was not accidental. Employees reported removing © marks from books downloaded as training materials.

In Canada, in a similar search for a rationale to explain away copyright infringement, a company that was helping itself to copyright-protected curated legal case data to build an AI based legal reference service, claimed that forcing it to license the content would stifle innovation and drive AI businesses out of the country. See CanLII v CasewayAI: Defendant Trots Out AI Industry’s Misinformation and Scare Tactics (But Don’t Panic, Canada). The AI developers’ strategy seems to be that if you don’t want to license and pay for IP protected content, (or perhaps the owner of the content prefers not to license it, as is their right) just take it and claim some overriding purpose, like protecting domestic innovation or national security.

But what about the argument that if China doesn’t respect intellectual property (IP), we need to adopt the same approach in order to compete? While Chinese courts in recent years have taken a much more robust position with respect to protecting the rights of IP owners, including patents, trademark and copyright, I am not going to argue that suddenly China has become a “rule of law” country. Rather, it is a “rule by law” state, the law being whatever the leadership of the Chinese Communist Party (CCP) decides it will be at any given moment. This is a fact. However, to suggest that the West, in particular the US, should adopt China’s legal modus operandi so as not to lose the so-called “AI race” not only undermines all the values and principles on which our society is based, including the principles of private property, fairness and transparency, but also dismisses three centuries of legal developments in the protection of IP, especially copyright. The evolution of copyright law has resulted in the creation of industries that contribute far more to the economic and cultural wellbeing of our society than any of the questionable outputs of the AI industry.

Yes, AI is here to stay. It can be put to beneficial or nefarious uses and has an undoubted strategic component. It can also be used to undermine and weaken human creativity. Is that the goal we are seeking?

It is worth noting that the tech bro’s have an easy and legal way out. In most instances, they can acquire access to the content they need legitimately. A market for licensing training data for AI development already exists and is further developing rapidly, as I wrote about earlier. Using Copyrighted Content to Train AI: Can Licensing Bridge the Gap? But just taking it and claiming “fair use” is easier and cheaper. And morally and probably legally wrong.

We have seen a lot of rogue policy making in Washington of late, from the illegal deportation of US residents, to the gutting of US government agencies, to the declaration of a tariff war against the world. It is time to take a more considered approach. Rash decisions in response to tech lobbying could lead to untold consequences and collateral damage to content industries that would be impossible to roll back and remedy. Thus, I was relieved to note that Michael Kratsios, Director of the US Office of Science and Technology Policy, the same OSTP to which OpenAI submitted its comments regarding AI training and national security, stated in a recent speech on American innovation that;

 “…promoting America’s technological leadership goes hand in hand with a threefold strategy for protecting that position from foreign rivals. First, we must safeguard U.S. intellectual property and take seriously American research security…”

That is a welcome recognition of the importance of IP as part of the process of innovation.

In this respect, the existing framework of copyright law has survived and adapted for over 300 hundred years. It has evolved with each new technological development, but the fundamental principle of giving an “author” of an original work the right to control how that work is used as well as the ability to earn a return from its use for a statutory period, with only limited exceptions, has remained unchanged. To undermine this principle in a flawed attempt to grasp the Holy Grail of AI leadership is self-defeating. Instead of sipping from AI’s Holy Grail we will be drinking from the poisoned chalice of IP theft.

Throwing copyright and the rule of law under the bus on the pretext that this is what’s needed to compete with China is not only self-serving, it is a sure path to ultimately losing the secret sauce of creativity and innovation. A country that steals IP rather than creating and respecting it will always lose the race.

© Hugh Stephens, 2025. All Rights Reserved

The Height of Hypocrisy! OpenAI Accuses DeepSeek of Stealing its Content


Image: Shutterstock (edited)

Am I the only one, or did anyone else have just a touch of schadenfreude when they read the story in the New York Times that OpenAI is claiming the Chinese start-up DeepSeek may have “improperly harvested” its data. What irony! DeepSeek caught everyone’s attention earlier this week when it announced a new AI application that appears to outperform or at least match OpenAI’s ChatGPT. Not only that, it is also open source and completely free to download and use. More important, its alleged development costs were but a fraction of the development cost of US models, reported to be in the hundreds of millions whereas DeepSeek claims that it produced its results with an investment of as little as $6 million. (This clearly does not include the value of earlier R&D, but the question is whether or not DeepSeek covered these costs).

We saw the shock this caused on the NASDAQ especially with respect to chip-designer Nvidia’s share price, with over $600 billion in value wiped off its valuation in one day. As often happens, there was a rebound the following day as saner heads digested the news and found a silver lining in the fact that AI development costs could be greatly reduced yet spending would continue. Of course, the spectre of “unfair” Chinese competition was raised, while others wondered how DeepSeek did it in the face of US high-tech embargos on the sale of advanced Nvidia chips to China. “They must have cheated” was the mantra.

It appears that part of DeepSeek’s success is based on what is called “distillation” in the AI industry. As explained in this tech article, distillation is a technique that “focuses on creating efficient models by transferring knowledge from large, complex models to smaller, deployable ones”. The earlier models do the heavy-lifting with respect to research and as they produce results, those results are incorporated into newer training models that take advantage of the earlier work. To my untrained mind, this sounds like building on knowledge created by others, as happens all the time or, to look at it negatively, by free riding on the investment of others. The question is, what knowledge is protectable and proprietary? This dichotomy is at the heart of the debate over copyright. You can’t copyright an idea, but the specific expression of an idea is protectable. Likewise, the functionality of software code cannot be copyrighted although a specific software program is considered a “literary work” and is protected.

There is also the issue of open source. Release of code as open source enables further advancements, pushing the boundaries of knowledge. This is a common feature of the digital revolution and one reason for rapid advancements in Silicon Valley. However, not all content is fully open source. In the case of OpenAI it would seem it considers its content to be proprietary to the extent that it can control the use to which it is put. The accusation is that DeepSeek took and distilled OpenAI’s results to create a competing application without permission. In effect, DeepSeek used ChatGPT to improve its own model.

OpenAI’s position that it can dictate the uses to which ChatGPT can be put is, in my view, contradictory, hypocritical and in the end morally if not legally indefensible. OpenAI has no problem enabling and encouraging people to use ChatGPT to “improve on” or create works in any field, from AI written novels to AI created art or music, resulting in works that directly compete with authors, artists and musicians. Remember that OpenAI has used their original copyrighted works without permission to build the AI machine that now threatens their livelihood and ability to create. Yet when that same AI application, ChatGPT, is used to improve on or create a new and better AI platform, this is declared to be infringement.

While distillation is common across the AI field, OpenAI claims its terms of service prohibit any use of data generated by its systems to build technologies that compete in the same market. This caveat would be similar to that which is applied to copyrighted content made publicly available on websites, with a disclaimer that it is copyright protected and potential users should contact the rightsholder. Did that stop OpenAI from helping itself without permission to this protected content to train its AI algorithm? Absolutely not. In fact, while it justified its activities by saying that all it was doing was taking “publicly available” content, not even paywalls and terms of service were allowed to get in their way. This was clearly demonstrated in the case brought against it by the New York Times. (When Giants Wrestle, the Earth Moves (NYT v OpenAI/Microsoft).

It seems that from OpenAI’s perspective, use of other people’s content without permission is okay, but when it’s their content, not so much. OpenAI is partially owned by Microsoft which is itself engaged in rolling out its own AI application, Copilot, trained in part through the unwitting contribution of hundreds of millions of users of Microsoft software, like MS-Word, as I wrote about last month. (Writers! Do You Know your Drafts on MS Word are being Scooped by Microsoft to Build its AI Algorithm? But You Can Stop This From Happening (Read On).

Given all that has transpired, and the struggle that authors and rightsholders are facing to protect and get paid for the use of their works in AI training, it is hard to have much, if any, sympathy for OpenAI. I certainly don’t. Poetic justice.

© Hugh Stephens, 2025. All Rights Reserved.

Another AI Scraping Copyright Case in Canada: News Media Companies Sue OpenAI

Image: Shutterstock (AI assisted)

First, I heard it on the radio. The word “copyright” caught my attention because that’s a word seldom heard on the morning news. Then the news stories started to appear, first on Canadian Press, which was “largely” accurate, then on the CBC, Globe and Mail, even the New York Times. A consortium of Canadian media, including the Toronto Star, Postmedia, the Globe and Mail and the CBC/Radio-Canada is suing OpenAI in Ontario Superior Court for copyright infringement and for violating their Terms of Use. The publishers are seeking CAD20,000 per infringement plus an injunction to prevent further infringement. The case largely parallels a similar one in the US brought by the New York Times against OpenAI and its largest investor Microsoft, which I wrote about earlier this year (When Giants Wrestle, the Earth Moves (NYT v OpenAI/Microsoft).

Despite what the press articles state, this is not the first case in Canada where copyright infringement has been alleged as a result of data being scraped to use in AI applications, as I noted last week. However, it is the first case where news organizations have gone after an AI development company. It also has nothing to do with the Online News Act as stated in the Canadian Press report. In fact, it is the absence of legislation in Canada regarding copyright and AI that is partly responsible for this being fought out in the courts.

OpenAI in its statement quoted “fair use” and “related international copyright principles” to justify its behaviour. The fact that the US fair use doctrine does not apply in Canada, combined with the closed nature of fair dealing exceptions, and the lack of a Text and Data Mining exception in Canadian law, could prove troublesome for OpenAI. It also has the effrontery to state that it offers “opt out” options for news publishers. When you are taking someone’s proprietorial content without permission or payment, it is an insult to tell them they can always opt out. To steal, and then to tell your victim to request that you not steal again, is hardly the way ethical companies operate.

One question to be decided is whether the scraped content falls under copyright as it is a well-established principle that the “news of the day” is not subject to copyright protection. See (Do News Publishers “Own” the News?) News media may not have a monopoly over reporting on what is happening in, say, Gaza but they certainly have the rights to their expression of what is happening through their coverage. OpenAI has also apparently said that its web crawlers are just “reading” publicly available material, as a human being would do. However, reading and copying are two different things, although proving reproduction may be difficult given the unwillingness of OpenAI to disclose its training methods, an issue that has come up in the New York Times case. “Publicly available” is irrelevant, since being publicly available on the internet, or in a library, or anywhere else, does not justify copyright infringement.

In their suit, the plaintiffs are also alleging circumvention of a TPM (technological protection measure, sometimes referred to as a digital lock, which puts content behind a paywall). This is a separate violation of the Copyright Act. In addition, they are alleging violation of their Terms of Use, which are linked to their websites. When a user accesses material on the publishers’ websites, they must agree to the Terms of Use which, among other things, state that the content to be accessed is for the “personal, non-commercial use of individual users only, and may not be reproduced or used other than as permitted under the Terms of Use”, unless consent is given.

The publishers state that OpenAI was well aware of the need to pay for their content and to obtain permission to use it. That is essentially the position also taken by the New York Times. OpenAI has reached licensing agreements with some publishers including the Associated Press, Axel Springer (Business Insider, Politico), the Financial Times, the publishers of People, Better Homes and Gardens and other titles, News Corp (Wall Street Journal and many others), The Atlantic, and others. But not the New York Times obviously (negotiations broke down, leading to the current lawsuit) and not with any of the Canadian media bringing suit. A licensing agreement acceptable to both parties will be the likely outcome of this case. As the US-based Copyright Alliance has pointed out, generative AI licensing isn’t just possible, it’s essential.

There is a vacuum when it comes to legislation in Canada, and elsewhere, regarding the intersection of copyright and AI development. Various models are being experimented with, from the “throw copyright under the bus” model in Singapore to a more nuanced model in Japan, to uncertainty elsewhere. Australia has just produced a Senate report in response to its public consultation on the issue. Among its recommendatons, the Select Committee Report on Adopting Artificial Intelligence called for changes that would ensure copyright holders are compensated for use of their material, while tech firms would be forced to reveal what copyrighted works they used to train their AI models. Canada initiated a public consultation on the topic last year and the Australian Committee’s recommendations with respect to copyrighted content are essentially what the Canadian copyright community asked for. However, since receiving input in January of this year and publishing the submissions received in June, there has been no further information released by the Canadian government. A conclusion similar to the recommendations in Australia would be welcome.

Canadian creators and rightsholders are waiting for some action. Meanwhile the only alternative is to toss the issue to the courts to adjudicate.

(c) Hugh Stephens, 2024. All Rights Reserved.

Artificial Intelligence and Copyright: The Canadian Cultural Community Speaks Out

Image: http://www.shutterstock.com

The extended period set by the Canadian Government (through Innovation, Science and Economic Development Canada, ISED) for response to its consultation paper on Artificial Intelligence (AI) and Copyright closed on January 15. We will start to see a flurry of submissions released by participants while ISED digests and assesses the input it has received. One of the first is the submission from the Coalition for the Diversity of Cultural Expression (CDCE), which represents over 360,000 creators and nearly 3,000 cultural businesses in both French and English-speaking parts of Canada. CDCE’s membership includes organizations representing authors, film producers, actors, musicians, publishers, songwriters, screenwriters, artists, directors, poets, music publishers—just about every profession you can think of that depends on creativity, and protection for creative output. The CDCE submission highlights three key recommendations, summarized as follows;

  • No weakening of copyright protection for works currently protected (i.e. no exception for text and data mining to use copyrighted works without authorization to train AI systems)
  • Copyright must continue to protect only works created by humans (AI generated works should not qualify)
  • AI developers should be required to be transparent and disclose what works have been ingested as part of the training process (transparency and disclosure).

While none of these recommendations are surprising, and from my perspective are eminently reasonable, I am sure we will also see a number of submissions arguing that, “in the interests of innovation”, access to copyrighted works is not only essential but should be freely available without permission or payment. OpenAI, the motive force behind ChatGPT—and the defendant in the most recent high-profile copyright infringement case involving AI (When Giants Wrestle, the Earth Moves (NYT v OpenAI/Microsoft)—has already staked out part of this position. In its brief to the UK House of Lords Select Committee looking into Large Language Models (LLMs), a key technology that drives AI development, the company says;

“Because copyright today covers virtually every sort of human expression–including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials (emphasis added). Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

OpenAI claims that it respects content creators and owners and looks forward to continuing to work with them, citing among other things, the licensing agreement for content it has signed with the Associated Press. But failure to reach a licensing deal with the New York Times is really the crux of the lawsuit that the media giant has brought against OpenAI and its key investor Microsoft. If reports are true that OpenAI’s licensing deals top out at $5 million annually, it is not surprising that licensing negotiations between the Times and OpenAI broke down over such lowball offerings.

As for the CDCE submission to ISED, it recommends that the government refrain from creating any new exceptions for text and data mining (TDM) since this would interfere with the ability of users and rightsholders to set the boundaries of the emerging market in licensing. No copyright exemption for AI is what the British government has just confirmed, after playing footsie with the concept for over a year. Apart from the examples of the licensing deals that OpenAI has with the Associated Press and German multimedia giant Axel Springer, the CDCE paper notes a range of other recent examples of content owners offering access to their product through licensing arrangements, including Getty Images, Universal Music Group and educational and scientific publishers like Elsevier. The paper also urges the government to avoid interfering in the market when it comes to setting appropriate compensation, leaving it to market players or, where the players can’t reach agreement, to the quasi-judicial Copyright Board.

In my view, licensing is going to be the solution that will eventually level the playing field, but to get there it will require that major content players lockout the AI web-crawlers while pursuing legal redress, as the NYT is doing. This will help to open the licensing path to smaller players and individual creators who don’t have the resources available to employ either technical or legal remedies. (The issue of what has already been ingested without authorization still needs to be settled). As for the tech industry’s suggestion that creators can opt-out of content ingestion if they wish, CDCE rightly points out that this is standing the world on its head, and would be contrary to longstanding copyright practice. Not only is it impractical in a world where what goes into an AI model is a black box (thus the imperative for transparency) but it is like saying a homeowner has to request not to be burgled, or else can expect to become a target.

On the question of whether AI generated works should be granted copyright protection, CDCE points out the double-standard of proposing an exception to copyright for TDM for inputs while claiming copyright protection for AI generated outputs. The need for human creativity is a line that has been firmly held by the US Copyright Office, pushing back on various attempts to register AI-generated (as opposed to AI-assisted) works. Canada has not been quite so clear cut in its position, owing to the way in which copyright is registered (almost by default, without examination) in Canada, as I pointed out in this blog post (A Tale of Two Copyrights). While AI generated works have received copyright protection in Canada (Canadian Copyright Registration for my 100 Percent AI-Generated Work), this is more by oversight than design, given the way the Canadian copyright registration system works.

Thirdly, we turn to transparency, a sine qua non if licensing solutions are to be implemented.  If authors don’t know whether their works are being used to train AI algorithms, or can’t easily prove it, licensing will fall flat. CDCE calls for publication of all content ingested into training models, disclosure of any content outputs that contain AI, and design of AI models to prevent generation of illegal or infringing content. This is similar to requirements already under consideration in the EU.

CDCE also makes the important point that it is not just copyright legislation that defends individual and collective rights against the incursions of AI and big AI platforms. While the Copyright Act offers some protection to creators, privacy legislation is important for all citizens. As the UK Information Commissioner has pointed out in a recent report, the legal basis for web-scraping is dependent on (a) not breaching any laws, such as intellectual property or contract laws and (b) conformity with UK privacy laws (the GDPR, or General Data Protection Regulation), where the privacy rights of the individual may override the interests of AI developers, even if data scraping meets other legitimate interest tests.

Finally, there is the question of the moral rights of creators that can be threatened by misapplication of AI, whether it is infringement of a performer’s personality or publicity right, distortion of their performance or creative output, misuse of their works for commercial or political reasons or any of the other reasons why copyright gives the creator the right to authorize use of their work.

Quite apart from the question of AI, there are of course other outstanding copyright questions that need to be resolved urgently, including the longstanding issue of the ill-conceived education “fair dealing” exception that has undermined if not permanently damaged the educational publishing industry in Canada. This exception needs to be narrowed to allow users continued unlicensed access to copyrighted materials under fair dealing guidelines for study, research and educational purposes but to limit institutional use to situations only where a work is not commercially available under a license from a rightsholder or collective society. While this issue requires looking back and fixing something that is already broken, policy making with respect to AI and copyright needs to anticipate the future and “do no harm”, while requiring AI developers to open up their black boxes and respect existing rights. This should be achieved by maintaining and protecting the rights of creators in ways that will facilitate market-based licensing solutions for use of copyrighted content by AI developers, while ensuring that creative output remains the domain of human beings, and not machines.

© Hugh Stephens, 2024.

When Giants Wrestle, the Earth Moves (NYT v OpenAI/Microsoft)

Image:www.shutterstock.com

There is no better way to start out the New Year, 2024, with a commentary on Artificial Intelligence (AI) and copyright. It was the big emerging issue in 2023 and is going to be even bigger in 2024. The unlicensed and unauthorized reproduction of copyright-protected material to train AI “machines”, in the process often producing content that directly competes in the market with the original material, is the Achilles heel of AI development. To date, no one knows if it is legal to do so, in the US or elsewhere, as the issue is still before the courts. The cases brought to date by artists, writers and image content purveyors like Getty Images have not always been the strongest or best thought out. In one instance, the plaintiffs had not even registered the copyright on some of the works for which they were claiming infringement, a fatal flaw in the US where registration is a sine qua non in order to bring an infringement case. That may have been the most egregious example of a rookie error but in general the artists’ and writers’ cases have not gone too well so far, although the process continues. Some cases are on stronger grounds than others. Here is a good summary. The Getty Images case will be an interesting one to watch. And now the New York Times has weighed in with a billion-dollar suit against Open AI, and Microsoft. The big guys are now at the table and the sleeves are rolled up. The giants are wrestling.

What is at issue could be nothing less than the survival of the news media and the ability of individual creators to protect and monetize their work. It could also open a pathway to legitimacy for the burgeoning AI industry. The ultimate solution is surely not to put a halt to AI development, nor to put content creators out of business. It is to find a modus vivendi between the needs of AI developers to ingest content in order to train algorithms that will “create” (sort of) content–assembled from vast swathes of input–and the rights of content creators. While training sets are generally very large, some of the input can be very creator-specific and the output very creator-competitive. This is where the New York Times comes in.

The Times, like any enterprise, needs to be paid for the content it creates in order to stay in business and create yet more content. If its expensively acquired “product”, whether news, lifestyle, cooking, book reviews or any of the other content that Times’ readers crave and are willing to pay for, can be obtained for free through an AI algorithm (“What is the most popular brunch recipe in the NYT using eggs, bacon and spinach”, or “What does Thomas Friedman think of…..”), this creates a huge disincentive to go to the source and undermines journalism’s business model, already under severe stress and threat.

The Times is one of the few journals that has managed to thrive, relatively speaking, in the new digital age at a time when many of its competitors are dying on the vine. According to Press Gazette, the New York Times is the leading paywalled news publisher, with 9.4 million subscribers. (Wall Street Journal and Washington Post are numbers two and three respectively). You need to pay to read the Times, and why not? But paying for access does not give you the right to copy the content, especially for commercial purposes. (The Times offers various licensing agreements for reproduction of its content, with cost dependent on use). Technically, all it takes is one subscription from OpenAI and the content of the Times is laid bare to the reproduction machines, the “large language models”, or LLMs, used by the AI developers. The Times has now thrown down the gauntlet. Its legal complaint, 69 pages long, makes compelling reading. If there ever was a “smoking gun” putting the spotlight directly on the holus-bolus copying and ingestion of copyright protected proprietary content in order to produce an unfair directly-competing commercial product that harms the original source, this is it. It’s a far cry from earlier copyright infringement cases brought by some artists and writers.

While you can read the complaint yourself if you are interested (recommended reading), let me tease out a few of the highlights. After setting out the well-proven case for the excellence of its journalism, the Times’ complaint notes that while the defendants engaged in widespread copying from many sources, they gave Times’ content particular emphasis when building their LLMs, thus revealing a preference that recognized the value of that content. The result was a free ride on the journalism produced at great expense by the Times, using Times’ content to build “substitutive products” without permission or payment.

Not only does ChatGPT at times regurgitate the Times’ content verbatim, or closely summarizes it while mimicking its style, at other times it wrongly attributes false information to the Times. This is referred to in AI circles as “hallucination”, something the complaint labels misinformation that undermines the credibility of the Times’ reporting and reputation. Hallucination is a particularly dangerous element of AI produced content. Rather than admitting it doesn’t know the answer, the AI algorithm simply makes it up, complete with false references and attributions all of which make it very difficult for the average reader to separate fact from fiction. This misinformation is the basis of the Times’ complaint for trademark dilution that accompanies various other copyright related complaints of infringement. Concrete examples of such misinformation are provided in the complaint.

So too is ample evidence of users exploiting ChatGPT to pierce the Times’ paywall, by asking for the completion of stories that have been blocked for non-subscribers. There are concrete examples of carefully researched restaurant and product reviews that have been replicated virtually verbatim. Not only is the Times’ subscription model undermined, but the value it derives from reader-linked product referrals from its own platform bleeds to Bing when the product is accessed through Microsoft Search enabled by ChatGPT. Examples are given of full news articles based on extensive Times’ investigative reporting being reproduced by ChatGPT, with only the slightest variations. These are not composite news reports of what is happening in Gaza, for example, but a word-for- word lifting of a Times’ analysis of what Hamas knew about Israeli military intelligence. The Times’ complaint makes for chilling reading. AI’s hand has been caught firmly in the cookie jar.

What does the Times want out of all of this? The complaint does not specify a dollar amount, while noting the billions in increased valuation that has accrued to OpenAI and Microsoft as a result of ChatGPT. However, it asks for statutory and compensatory damages, “restitution, disgorgement, and any other relief that may be permitted by law or equity” as well as destruction of all LLM models incorporating New York Times’ content, plus, of course, costs. If the Times gets its way, this will be a huge setback for AI development as well as for OpenAI and Microsoft, but of course it may not come to that. The complaint notes that the Times had tried to reach a licensing deal with the defendants. OpenAI cried foul, expressing “disappointment”, and noting that they had been having “productive” and “constructive” discussions with the Times over licensing content. However, to me this is a bit like stealing the cookies, getting caught red-handed and offering to negotiate to pay for them, then crying foul when your offer is rebuffed. The Times has just massively upped the ante, making the potential licensing fees much more valuable.

The irony is that the use of NYT material by OpenAI or indeed other platforms like Google or Facebook potentially brings some advantage and drives some business to the Times, while obviously also providing commercial benefits to the AI program, search engines or social media platforms. The real question will be how that proprietary content is used, and how much is paid to use it. A similar issue is being played out in another context, most recently in Canada with Bill C-18 where news media content providers wanted the big platforms (Google and Meta/Facebook) that derive benefit from using or indexing that content to pay for accessing it. The result in Canada was both a standoff and a compromise. Facebook blocked Canadian news content rather than pay for it, while Google agreed to create a fund for access by the news media in return for being exempted from the Canadian legislation.

The NYT-OpenAI/Microsoft lawsuit is a different iteration of the same principle. Businesses that gain commercial advantage from using proprietary content of others should contribute to the creation of that content, either through licensing or some other means such as a media fund. The most logical outcome of the Times’ lawsuit is almost certainly going to be a licensing agreement. Given the seemingly unstoppable wave of AI development, meaningful licensing agreements would seem to be the best way to ensure fairness and balance of interests going forward.  

A Goliath like the New York Times is in a much better position to make this happen than a disparate group of writers and artists. Indeed, there are logistical challenges in being able to license the works of tens of thousands of content creators. In an earlier blog post, I postulated that perhaps copyright collectives might find a role for themselves in this area in future. In my view, ultimately the only logical solution to the conundrum of respecting rights-holders while facilitating the development of AI is to find common ground through fair and balanced licensing solutions. The wrestling giants of the NYT and Microsoft may help show the way.

© Hugh Stephens 2024. All Rights Reserved.