
We read daily about new lawsuits brought by rightsholders against AI developers, strategy papers floated by governments seeking to solve the riddle of reconciling copyright and AI, and declarations issued by authors proclaiming the end of human creativity. The creative community seems to have coalesced around the principles of transparency, permission and remuneration but the tools to effect those key elements remain elusive. The AI community would generally prefer not to pay or ask permission but is gradually accepting the need to license content. Yet there is still a technical gap in terms of knowing what content has been used, when and how. Without that knowledge, the principles of permission and remuneration are left treading water. The blog post below by Jim Bryant, Co-Founder and CEO, Citations LLC, offers potential solutions to this challenge, and I offer it to you as a possible pathway forward. I have no financial interest in Citations LLC, nor did they pay me to post this information. It is presented as a contribution to the search for a world where copyright and AI can co-exist for mutual benefit. (Hugh Stephens)
A problem or an opportunity?
Imagine a student in Montreal asks an AI assistant a question about traditional Chinese medicine, in French. The AI answers fluently — in French — drawing on the Encyclopedia of China, a monumental work with over 125 million characters that has never been translated into any language. Now imagine the same student switches to English and asks a follow-up question. The AI answers again, equally fluently, in English. The student is satisfied. The publisher gets nothing. No notification, no attribution, no compensation. They don’t even know it happened.
This scenario is entirely plausible with current AI technology. And while it represents a genuine copyright problem — real-time AI translation of a protected work, without license, in a jurisdiction whose law was not written to contemplate it — it also represents something else: an extraordinary, unrealized opportunity.
For the first time in the history of publishing, the technology exists to know, at the moment it happens, that someone in Montreal, Mumbai, or Mexico City is asking a question that your content just answered. The question is whether publishers will help build the systems to capture that signal — or whether they will leave it entirely to the AI companies, who are already building without them.
Publishers have always been flying blind.
Think about what publishers have never been able to know — and what AI companies, for the first time, can. An AI system that has trained on your works without permission is, in effect, drawing on your content every time it answers a relevant question. Which of your titles is it using right now, and where? Which readers are getting answers derived from your content without ever being directed back to the original? Which backlist titles are generating AI responses in markets where you have no distribution and no visibility? Which gaps in your catalogue are readers repeatedly trying to fill — and how would you know, if the only signal is buried inside a system you have no access to? The argument for independent monitoring infrastructure is not only about compensation. It is about visibility. Publishers are currently funding AI responses with their content and receiving nothing in return — not money, not data, not even the knowledge that it is happening.
For centuries, publishers sent their works into the world and largely lost sight of them. Sales data arrived months or years later, filtered through agents, booksellers, distributors, and described what sold — not what readers wanted but couldn’t find. The feedback loop from reader demand to editorial decision has always been slow, indirect, and incomplete.
A properly instrumented knowledge access infrastructure changes all of that. Real-time query data across AI systems is, in effect, a continuous signal of what readers want — more granular, more current, and more honest than any market research tool the industry has ever had. That data is a byproduct of the same system that creates the copyright exposure publishers are currently fighting in court.
The moment of demand is the moment to act.
Here is the specific opportunity that AI creates, and that no prior technology has made possible: when an AI system surfaces content in response to a query, it creates a demonstrated moment of demand. A reader who just received an AI-generated answer drawn from a specific book is, at that moment, maximally interested in that book. That is the moment to offer them the chance to borrow it from a library, purchase it from a retailer, or access an authorized digital edition.
Rather than substituting for the book, the AI interaction becomes the discovery mechanism that leads to it. Publishers have spent decades trying to close the distance between the moment a reader becomes interested in a title and the moment they act on that interest. AI closes that distance to zero — but only if the infrastructure exists to capture it. Without that infrastructure, the moment passes, the reader moves on, and the publisher never knew the opportunity existed.
Libraries are being bypassed — and publishers are losing their best customers. Libraries are among the largest single customers some publishers have. A major academic or reference publisher may depend on library subscriptions for a substantial share of its revenue. AI is disrupting that relationship in ways that have received too little attention. When a patron who would previously have borrowed a book — or prompted their library to acquire it — instead receives an AI-generated answer derived from that same book, the library never makes the purchase, the publisher never sees the revenue, and neither institution knows the transaction occurred. The AI company captures the value; the library loses a use case; the publisher loses a sale. The institution most structurally committed to legal, compensated access to knowledge is being systematically bypassed by systems that obtained that knowledge without payment.
The same logic applies to real-time trend identification. Aggregate query patterns across an AI knowledge system are a leading indicator of what readers want — not what they bought last quarter, but what they are looking for right now. Which subjects are rising? Which titles are being asked about in markets where they have no distribution? Which authors are generating interest that isn’t yet reflected in sales? This intelligence, continuously available, would transform publishing from a reactive industry into a responsive one.
The translation question is the hardest — and the most important.
The Encyclopedia of China example is worth dwelling on, because it illustrates both the opportunity and the complexity in their sharpest form. That encyclopedia has never been translated — into French, English, or any other language. The economics of translation have made it prohibitive: 125 million characters, uncertain commercial return, no obvious path to a global audience. As a result, it has been accessible only to readers of Chinese. That constraint has nothing to do with the quality or the importance of the content.
AI removes that constraint entirely. In this hypothetical, a reader anywhere in the world could ask the encyclopedia a question in their own language and receive an answer. This is, genuinely, one of the most remarkable things that AI makes possible: the dissolution of language as a barrier to knowledge, overnight, at no marginal cost.
But it raises a set of copyright questions that existing law is not equipped to answer. A real-time AI translation is, in the most precise legal sense, the creation of a derivative work — at the point of query, in a foreign jurisdiction, without a license, without attribution, and without compensation to the original publisher. It is not covered by any existing text-and-data-mining exception, because it is not mining — it is real-time derivation. It is not covered by fair use or fair dealing analysis that was designed for static reproduction, not dynamic on-the-fly translation.
And yet the underlying interest of the publisher is not to prevent this from happening — it is to be compensated when it does, and to have some say in how their content is represented. A framework that would allow the publisher of the Encyclopedia of China to authorize AI-mediated translation under defined conditions, receive a per-query payment, and have the source attributed, would serve everyone’s interests. The absence of such a framework means the publisher gets nothing, the AI company gets everything, and the reader gets an answer of uncertain provenance.
The ten copyright challenges — briefly.
It is worth cataloguing the specific challenges, because they are often discussed in isolation when they actually share a common cause. The publishing industry currently faces at least ten major copyright issues arising from AI:
AI training — whether training on copyrighted works requires permission and compensation, currently being litigated in multiple jurisdictions.
Transparency — AI developers do not disclose what content their models were trained on, making it impossible for rights holders to assess exposure or negotiate terms.
Reproduction — models can and do reproduce passages that closely approximate protected expression, as documented in peer-reviewed computer science research.
Market impact — AI summaries and Q&A responses can substitute for the original work, displacing revenues that would otherwise flow to the publisher.
Derivative works — the degree of transformation required to render AI output non-infringing remains genuinely unsettled, particularly for outputs that blend multiple protected sources.
Attribution — AI outputs routinely fail to identify the works they draw on, undermining both the moral rights of authors and the practical basis for any royalty mechanism.
Compensation — no industry standard governs AI licensing fees; per-query, per-token, and blanket models are all being proposed, with no settled framework.
Retrieval — retrieval-augmented generation systems access copyrighted content at inference time, raising rights questions distinct from and additional to those arising from training.
International law — training data crosses borders; copyright law does not; EU, US, UK, Japanese, and Canadian frameworks diverge in ways that create genuine compliance complexity.
Auditability — without verifiable records of what was accessed, when, and in what context, no licensing agreement is enforceable and no royalty calculation is credible.
These are not ten separate legal problems. They are ten symptoms of one missing piece of infrastructure: a neutral, independent system for monitoring how AI systems access and use copyrighted content, reporting on that usage in real time, and enabling settlement between AI platforms and rights holders on the basis of verified data rather than estimates.
Why the infrastructure must be independent.
This point deserves emphasis, because there is a tempting shortcut that would not actually work. Publishers cannot rely on AI developers to build and operate the systems that monitor AI’s use of their content. The conflict of interest is structural: the party whose compliance is being measured cannot be the party doing the measuring.
What is required is a neutral layer — operated independently of both AI developers and publishers — that records access events, aggregates usage data, reports to rights holders, and enables automated settlement. Think of it as the knowledge economy’s equivalent of a financial clearinghouse: not owned by any single participant, trusted by all of them, and essential to the functioning of the market.
This is not a novel concept — it is exactly the model that makes collective rights management organizations function in the music industry and payment card networks function in financial services. Every industry that has needed to account for consumption at scale and distribute revenues to multiple rights holders has eventually built a neutral clearinghouse.
The window is open — but not indefinitely.
Canada’s AI strategy, recently released, makes almost no mention of copyright or the rights of content creators — a significant omission that Hugh has written about on this blog. The European Parliament’s work on AI and copyright has moved further, but still focuses primarily on training rather than on the access and retrieval layer where the most tractable opportunities lie.
The practices governing how AI systems access knowledge are being established right now, largely by default. The companies building AI systems are not waiting for a legal or regulatory framework; they are building, and the norms are hardening around what they build. Publishers who are not at the table when that infrastructure is designed will find themselves subject to whatever framework others have built for them.
Copyright law exists to balance access and incentive — to ensure that knowledge can circulate while the conditions that make knowledge production sustainable are preserved. AI does not change that objective. It changes the technical conditions under which the balance has to be achieved. The good news is that those technical conditions, for the first time, make real-time monitoring, attribution, and settlement not just possible but straightforward.
The question is not whether AI will access books. It will. The question is whether publishers will be watching when it does — and whether they will have built the systems to act on what they see.
That system already exists. It logs the moment, attributes the source, and settles the account — not as a future framework, but as infrastructure operating today. It’s called Citations, and it’s already watching.
* * *
About the author
Jim Bryant is the co-founder and CEO of Citations LLC, which has built the independent infrastructure for rights-aware AI access to authoritative content — enabling real-time monitoring, attribution, and settlement between AI platforms and publishers. See how it works at: citationslogic.ai. Jim previously founded ProCD, one of the first CD-ROM reference publishing companies; managed Information Please, which became one of the most visited reference destinations of the early internet; and founded Trajectory, which developed and deployed natural language processing algorithms to read and extract structured metadata from over one million books in English and Chinese.
(c) Citations LLC, 2026












