The Height of Hypocrisy! OpenAI Accuses DeepSeek of Stealing its Content


Image: Shutterstock (edited)

Am I the only one, or did anyone else have just a touch of schadenfreude when they read the story in the New York Times that OpenAI is claiming the Chinese start-up DeepSeek may have “improperly harvested” its data. What irony! DeepSeek caught everyone’s attention earlier this week when it announced a new AI application that appears to outperform or at least match OpenAI’s ChatGPT. Not only that, it is also open source and completely free to download and use. More important, its alleged development costs were but a fraction of the development cost of US models, reported to be in the hundreds of millions whereas DeepSeek claims that it produced its results with an investment of as little as $6 million. (This clearly does not include the value of earlier R&D, but the question is whether or not DeepSeek covered these costs).

We saw the shock this caused on the NASDAQ especially with respect to chip-designer Nvidia’s share price, with over $600 billion in value wiped off its valuation in one day. As often happens, there was a rebound the following day as saner heads digested the news and found a silver lining in the fact that AI development costs could be greatly reduced yet spending would continue. Of course, the spectre of “unfair” Chinese competition was raised, while others wondered how DeepSeek did it in the face of US high-tech embargos on the sale of advanced Nvidia chips to China. “They must have cheated” was the mantra.

It appears that part of DeepSeek’s success is based on what is called “distillation” in the AI industry. As explained in this tech article, distillation is a technique that “focuses on creating efficient models by transferring knowledge from large, complex models to smaller, deployable ones”. The earlier models do the heavy-lifting with respect to research and as they produce results, those results are incorporated into newer training models that take advantage of the earlier work. To my untrained mind, this sounds like building on knowledge created by others, as happens all the time or, to look at it negatively, by free riding on the investment of others. The question is, what knowledge is protectable and proprietary? This dichotomy is at the heart of the debate over copyright. You can’t copyright an idea, but the specific expression of an idea is protectable. Likewise, the functionality of software code cannot be copyrighted although a specific software program is considered a “literary work” and is protected.

There is also the issue of open source. Release of code as open source enables further advancements, pushing the boundaries of knowledge. This is a common feature of the digital revolution and one reason for rapid advancements in Silicon Valley. However, not all content is fully open source. In the case of OpenAI it would seem it considers its content to be proprietary to the extent that it can control the use to which it is put. The accusation is that DeepSeek took and distilled OpenAI’s results to create a competing application without permission. In effect, DeepSeek used ChatGPT to improve its own model.

OpenAI’s position that it can dictate the uses to which ChatGPT can be put is, in my view, contradictory, hypocritical and in the end morally if not legally indefensible. OpenAI has no problem enabling and encouraging people to use ChatGPT to “improve on” or create works in any field, from AI written novels to AI created art or music, resulting in works that directly compete with authors, artists and musicians. Remember that OpenAI has used their original copyrighted works without permission to build the AI machine that now threatens their livelihood and ability to create. Yet when that same AI application, ChatGPT, is used to improve on or create a new and better AI platform, this is declared to be infringement.

While distillation is common across the AI field, OpenAI claims its terms of service prohibit any use of data generated by its systems to build technologies that compete in the same market. This caveat would be similar to that which is applied to copyrighted content made publicly available on websites, with a disclaimer that it is copyright protected and potential users should contact the rightsholder. Did that stop OpenAI from helping itself without permission to this protected content to train its AI algorithm? Absolutely not. In fact, while it justified its activities by saying that all it was doing was taking “publicly available” content, not even paywalls and terms of service were allowed to get in their way. This was clearly demonstrated in the case brought against it by the New York Times. (When Giants Wrestle, the Earth Moves (NYT v OpenAI/Microsoft).

It seems that from OpenAI’s perspective, use of other people’s content without permission is okay, but when it’s their content, not so much. OpenAI is partially owned by Microsoft which is itself engaged in rolling out its own AI application, Copilot, trained in part through the unwitting contribution of hundreds of millions of users of Microsoft software, like MS-Word, as I wrote about last month. (Writers! Do You Know your Drafts on MS Word are being Scooped by Microsoft to Build its AI Algorithm? But You Can Stop This From Happening (Read On).

Given all that has transpired, and the struggle that authors and rightsholders are facing to protect and get paid for the use of their works in AI training, it is hard to have much, if any, sympathy for OpenAI. I certainly don’t. Poetic justice.

© Hugh Stephens, 2025. All Rights Reserved.

Author: hughstephensblog

I am a former Canadian foreign service officer and a retired executive with Time Warner. In both capacities I worked for many years in Asia. I have been writing this copyright blog since 2016, and recently published a book "In Defence of Copyright" to raise awareness of the importance of good copyright protection in Canada and globally. It is written from and for the layman's perspective (not a legal text or scholarly work), illustrated with some of the unusual copyright stories drawn from the blog. Available on Amazon and local book stores.

One thought on “The Height of Hypocrisy! OpenAI Accuses DeepSeek of Stealing its Content”

  1. Again, another timely and fascinating article that is disturbing as it concerns AI and its many uses.

Leave a Reply

Discover more from Hugh Stephens Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading