
Image: Shutterstock (modified)
Earlier this month, thirteen publishers sued Anna’s Archive (a website that provides access to pirate libraries) for copying and distributing millions of infringing files, both books and research journal articles. Anna’s Archive claims to host over 63 million books and 95 million articles on its site. Like a number of other providers of pirated content, the Archive presents itself as a non-profit seeking altruistically to provide open-source knowledge to all and sundry. It is more than happy to provide unauthorized access to OPC (Other People’s Content). Not only that, it disclaims responsibility for any copyright infringement that occurs because it only links to its sources, although they happen to include the world’s best-known pirate libraries. On its website it states that it mirrors or collaborates with Sci Hub, Library Genesis (LibGen) and Z-Lib, all notorious pirate sites, while it scrapes Hathi, Internet Archive and DuXiu. Some of these sites, to be sure, offer legitimate content, or a mix of copyright infringing and non-infringing material. Others, such as Sci-Hub (which I wrote about here), offer almost exclusively infringing (i.e. pirated) content. This action by the publishers marks the latest move in the endless struggle to preserve authors’ and publishers’ rights against copyright opponents who use the smokescreen of “free, public knowledge” to undermine legal protections for authors.
The struggle has been going on for years and takes different forms. Each time action is taken, new means to justify piracy and evade sanctions emerge. Anna’s Archive itself was founded only in 2022 out of Pirate Library Mirror (PiLiMi), a digitized library of pirated books. Notably, PiLiMi was one of the main sources for the pirate library compiled by Anthropic to train its AI platform, a direct result of which was Anthropic’s record $1.5 billion class action payout to a group of authors in September of last year. (When the End Does Not Justify the Means, Anthropic’s $1.5 Billion Lesson). While pirate libraries are cheap and easy sources of content for AI training, the results of the Anthropic case highlight the legal risks involved (META beware). The suit against Anna’s Archive will heighten those risks for American AI companies who access pirated content, although it is unlikely to deter Chinese and Russian developers.
To bring a copyright infringement case, plaintiffs must prove that specific works have been infringed. In the case of Anna’s Archive, the publishers (who license copyright from the authors with whom they have contracted) have listed 130 titles that have been infringed. Under US law, they can seek statutory damages of up to $150,000 per work for wilful infringement, for a total claim of $19.5 million. But it is not the money they are after, knowing full well from past experience that the operators of Anna’s Archive will not appear in court nor will they pay if the court finds against them. The people behind the Archive and their location are not known. Rather the intent of the plaintiffs is to obtain injunctions requiring third parties operating in the US such as hosting and service providers, data centres, and domain registrars to terminate support to Anna’s Archive. Unfortunately, unlike in many countries, the US has no legal provision for site blocking that would enable rightsholders to obtain court orders requiring ISPs to block offshore pirate websites, like Anna’s (which keeps changing its domain registry). But depriving it of hosting and domain services in the US will still disrupt its business. And make no mistake, Anna’s Archive is a business, not a “philanthropic” enterprise engaged in a purist pursuit of knowledge.
While the Archive solicits donations, it also offers premium (high speed) downloads for $200,000 (in crypto currency). It needs to solicit crypto because access to reputable payment platforms has been blocked. Guess where these downloads are going? To AI (LLM-Large Language Model) training. Not only does Anna’s Archive actively undermine the copyright protection that allows authors to earn a living through its unauthorized distribution of copyright protected works, but it further targets the creative sector by facilitating the unauthorized use of such works for AI training. This can result in AI generated works that directly compete with the works on which they were illicitly trained.
As I noted, Anna’s Archive is an aggregator of pirated content from various sources, one of which is Sci-Hub. Sci-Hub’s specialty is the theft of academic journals. Like the operators of Anna’s Archive, those behind Sci-Hub proclaim altruistic motives (“breaking academic paywalls since 2011”) yet what Sci-Hub is really doing is trying to destroy the academic publishing model. This model exists for a good reason. Academic publishing is not only an integral part of academic advancement and evaluation, it is also essential for the dissemination of credible, peer-reviewed research. What is the purpose of research if the results of documented investigation are not published on platforms that are credible? If reputable journals do not exist, how can reliable knowledge and trusted research be made available? There are already enough dubious “pay to play” journals in the marketplace to cause confusion, which is why established journals play such a key role. But to produce quality requires resources–for peer reviewing, editing, secure distribution, and so on.
By and large these tasks are undertaken by a few large-scale, very competent, academic publishers who, not surprisingly, charge for their services. Subscription fees, largely paid by university and other libraries, are the prime source of revenue. Someone has to pay for the work that goes into producing these products. If the subscription model is gutted by the “free” distribution of academic journals, the whole publishing model is undermined to no-one’s benefit. Attacking the business model of academic publishers not only removes the incentives for authors to produce and compete for space in prestigious journals, but it discourages publishers from undertaking the risks of distribution. This clearly does not advance the cause of disseminating knowledge. Indeed, it produces just the opposite effect, a contradiction that seems to have not occurred to the operators of pirate sites who are ostensibly motivated by a desire to freely share the fruit of all human knowledge. Academic publishers have naturally gone after Sci-Hub and, as I wrote last year, it has finally been blocked in India, one of the main jurisdictions where it was widely used. India is not the prime culprit, however. The number one and two countries for use are, respectively, China and the United States.
Another form of piracy that masqueraded as altruistic knowledge-sharing (note that it’s always someone else’s content that is freely shared) was the Internet Archive’s “Controlled Digital Lending” (CDL) model. This was a contrived means to try to justify doing an end run on licensed digital copies of copyrighted works. Under the unproven theory of CDL, it was fair use to lend a digital version of a hard copy work held in a library, substituting the digital version for the original, so long as the original was kept in inventory while the digital copy was on loan. Only one digital copy could be loaned against each copy of the original work. There were, however, several holes in the Internet Archive’s (IA) CDL arguments.
First, it was “an inconvenient truth” that the IA had to make a full unauthorized digital copy of the original work, itself an infringement but also one that damaged the market for the authorized digital version. Second, its argument that a hard copy existed to back up every digital copy on loan was shown to be a fiction. The IA tried to claim the hard copies “on file” were held by established libraries with which it was affiliated (it obtained access to the holdings of these libraries by offering free digitization) but at trial, where it was sued by the American Association of Publishers, it was clearly shown that no records were kept. The supposed one-to-one “own to loan” ratio was never intended to be respected. A final nail in the Internet Archive’s coffin was the fact they generated revenue off the CDL model they maintained.
As in the case of the current suit against Anna’s Archive, only a limited number of works (127 in total) was at issue in the Internet Archive CDL case. These were works to which the publishers held rights, had published e-book versions, and were proven to have been infringed. Despite this, the court’s decision in favour of the publishers, which was upheld on appeal, knocked the wind out of the CDL model. It hasn’t completely gone away (there are still some in the US who argue it may be valid in certain contexts) and it has never been litigated in Canada. However, after the US decision, I concluded that its legal status in Canada was unclear but risky. Many libraries are not happy with the existing e-book licensing model, but CDL is not the answer.
These schemes to freely disseminate “knowledge” (i.e. OPC, often created with great effort and at considerable expense by a rightsholder) are not as altruistic as they purport to be. All use someone else’s work to generate revenue. Sometimes this revenue is in the form of “donations”, but how much of the donated funds go to maintain the illicit operations and how much to other purposes is by no means clear. In any event, to steal in the name of charity, or to claim to be redistributing property (such as published works) as a self-appointed Robin Hood, is a thin excuse. All of these schemes have the effect of undermining the lofty goals they claim to support.
If the publishers win their sought-after injunctions against Anna’s Archive, it will be another attacking move in the digital chess game of pirate v. publisher (with the AI industry looking on as an interested observer). The Internet Archive case, Sci-Hub and now Anna’s Archive are all examples. Canadian publisher Kenneth Whyte of Sutherland Press traces it back even further to the 2004 Google Books case.
Frustrating as it may be for authors and publishers, taking on these pirate redistributors of other people’s copyrighted content ad seriatim is the only way to preserve the integrity of the ecosystem that produces the knowledge we all use and benefit from–knowledge that is encouraged, incentivized and protected by copyright. The self-righteous Robin Hoods, whether it’s the Internet Archive, Sci-Hub or most recently, Anna’s Archive, not only threaten to destroy the very system they profess to be promoting, they are also not averse to filling the purses they have hidden beneath their outlaw’s cloaks.
© Hugh Stephens, 2026. All Rights Reserved.









