Although I post my blog content on WordPress, I usually use MS Word to draft my content initially. I am used to it, and it is easy to use. Little did I know that, according to the blogsite and forum nixCraft, Microsoft recently (September Privacy update) switched on a feature that allows them to ingest everything you write on Word to help develop their AI Algorithm, called Copilot. The setting is turned on by default in the Privacy settings and must be unchecked manually. Did Microsoft tell you this? Well, kinda, sorta. Microsoft says, “we don’t use your customer data to train Copilot or its AI features unless you provide consent to do so”. Did you provide consent? You no doubt did, unknowingly, when Microsoft updated its Terms of Use, which it does on a regular basis. If you continued to use Office 365, you granted consent.
In the last few days, a pop up has appeared when I am doing something on Word through Office 365.
“Thank you for using Office! We’ve made some updates to the privacy settings to give you more control”.
If you believe that, I have a bridge to sell you.
If you go to Privacy Settings there is a summary blurb on how the Terms of Use were updated on September 30. If you look hard enough you will find this reference;
“We added a section on AI services to set out certain restrictions, use of Your Content and requirements associated with the use of the AI services.”
You really should read the full Terms but as Microsoft notes, this will take an hour of your time (ESTIMATED READING TIME: 55 Minutes; 14268 words).
Having waded through it, this I believe is the relevant wording;
b. To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services.
There is nothing here about opting out. You have to go to Privacy settings and do some digging to get to that. By masking these changes to make them appear that your privacy has been strengthened (whereas in fact it is just a content grab), Microsoft has stood things on its head by putting the onus on you, the user, to exercise your privacy rights. If you don’t want your creative work used to help train its AI algorithm, which in the end might compete directly or indirectly with your work, you need to opt out (unless you want to stop using MS Word altogether). Microsoft, however, is not suggesting this as a preferred option, or even letting it be widely known that it exists as an option. In fact, when you go into Settings to opt out you are presented with this little gem; “The Trust Center contains security and privacy settings. These settings help keep your computer safe. We recommend that you do not change these settings”.
But that is exactly what you must do if you want to keep your creative content out of the hands of Microsoft’s AI developers. Here is how to do it, based on instructions from nixCraft.
On a Windows computer, when in a Word file, go to File in the top left-hand corner. There is a drop-down menu. You want to go to Options. On my computer the Options choice does not show up unless you hit the arrow at the bottom of the page for “More”. When you get to Options, go to Trust Center (left side menu), then Trust Centre Settings. Next up is Privacy Options which leads you to Privacy Settings. There is a drop down menu, including Connected Experiences. There is a heading labelled “Experiences that analyze your content“. This box is checked for you. You want to uncheck it. To save the setting you will have to log out of Word and then log back in. (Update: I have just discovered a quicker way to do this. Go File-Options-General (top of list)-Privacy Settings-Connected Experiences-Experiences that analyze your content-Uncheck).
Eliminating this option will come at a price, according to all the “Learn More” button provided by MS, but it is your choice. For my part, I will forgo the bells and whistles for privacy.
The opt out process is not simple and not intuitive, but worth doing, even if only as a matter of principle. Office365, unlike Google Search or Bing, is not free. We pay to use it through an annual subscription. Even the tired old argument that you are providing your data as a sort of payment for “free” use of a platform’s service does not apply in this case. Microsoft needs more data to feed its AI machine and yours will do just fine, thank you very much. Don’t let them get away with it.
There is no better way to start out the New Year, 2024, with a commentary on Artificial Intelligence (AI) and copyright. It was the big emerging issue in 2023 and is going to be even bigger in 2024. The unlicensed and unauthorized reproduction of copyright-protected material to train AI “machines”, in the process often producing content that directly competes in the market with the original material, is the Achilles heel of AI development. To date, no one knows if it is legal to do so, in the US or elsewhere, as the issue is still before the courts. The cases brought to date by artists, writers and image content purveyors like Getty Images have not always been the strongest or best thought out. In one instance, the plaintiffs had not even registered the copyright on some of the works for which they were claiming infringement, a fatal flaw in the US where registration is a sine qua non in order to bring an infringement case. That may have been the most egregious example of a rookie error but in general the artists’ and writers’ cases have not gone too well so far, although the process continues. Some cases are on stronger grounds than others. Here is a good summary. The Getty Images case will be an interesting one to watch. And now the New York Times has weighed in with a billion-dollar suit against Open AI, and Microsoft. The big guys are now at the table and the sleeves are rolled up. The giants are wrestling.
What is at issue could be nothing less than the survival of the news media and the ability of individual creators to protect and monetize their work. It could also open a pathway to legitimacy for the burgeoning AI industry. The ultimate solution is surely not to put a halt to AI development, nor to put content creators out of business. It is to find a modus vivendi between the needs of AI developers to ingest content in order to train algorithms that will “create” (sort of) content–assembled from vast swathes of input–and the rights of content creators. While training sets are generally very large, some of the input can be very creator-specific and the output very creator-competitive. This is where the New York Times comes in.
The Times, like any enterprise, needs to be paid for the content it creates in order to stay in business and create yet more content. If its expensively acquired “product”, whether news, lifestyle, cooking, book reviews or any of the other content that Times’ readers crave and are willing to pay for, can be obtained for free through an AI algorithm (“What is the most popular brunch recipe in the NYT using eggs, bacon and spinach”, or “What does Thomas Friedman think of…..”), this creates a huge disincentive to go to the source and undermines journalism’s business model, already under severe stress and threat.
The Times is one of the few journals that has managed to thrive, relatively speaking, in the new digital age at a time when many of its competitors are dying on the vine. According to Press Gazette, the New York Times is the leading paywalled news publisher, with 9.4 million subscribers. (Wall Street Journal and Washington Post are numbers two and three respectively). You need to pay to read the Times, and why not? But paying for access does not give you the right to copy the content, especially for commercial purposes. (The Times offers various licensing agreements for reproduction of its content, with cost dependent on use). Technically, all it takes is one subscription from OpenAI and the content of the Times is laid bare to the reproduction machines, the “large language models”, or LLMs, used by the AI developers. The Times has now thrown down the gauntlet. Its legal complaint, 69 pages long, makes compelling reading. If there ever was a “smoking gun” putting the spotlight directly on the holus-bolus copying and ingestion of copyright protected proprietary content in order to produce an unfair directly-competing commercial product that harms the original source, this is it. It’s a far cry from earlier copyright infringement cases brought by some artists and writers.
While you can read the complaint yourself if you are interested (recommended reading), let me tease out a few of the highlights. After setting out the well-proven case for the excellence of its journalism, the Times’ complaint notes that while the defendants engaged in widespread copying from many sources, they gave Times’ content particular emphasis when building their LLMs, thus revealing a preference that recognized the value of that content. The result was a free ride on the journalism produced at great expense by the Times, using Times’ content to build “substitutive products” without permission or payment.
Not only does ChatGPT at times regurgitate the Times’ content verbatim, or closely summarizes it while mimicking its style, at other times it wrongly attributes false information to the Times. This is referred to in AI circles as “hallucination”, something the complaint labels misinformation that undermines the credibility of the Times’ reporting and reputation. Hallucination is a particularly dangerous element of AI produced content. Rather than admitting it doesn’t know the answer, the AI algorithm simply makes it up, complete with false references and attributions all of which make it very difficult for the average reader to separate fact from fiction. This misinformation is the basis of the Times’ complaint for trademark dilution that accompanies various other copyright related complaints of infringement. Concrete examples of such misinformation are provided in the complaint.
So too is ample evidence of users exploiting ChatGPT to pierce the Times’ paywall, by asking for the completion of stories that have been blocked for non-subscribers. There are concrete examples of carefully researched restaurant and product reviews that have been replicated virtually verbatim. Not only is the Times’ subscription model undermined, but the value it derives from reader-linked product referrals from its own platform bleeds to Bing when the product is accessed through Microsoft Search enabled by ChatGPT. Examples are given of full news articles based on extensive Times’ investigative reporting being reproduced by ChatGPT, with only the slightest variations. These are not composite news reports of what is happening in Gaza, for example, but a word-for- word lifting of a Times’ analysis of what Hamas knew about Israeli military intelligence. The Times’ complaint makes for chilling reading. AI’s hand has been caught firmly in the cookie jar.
What does the Times want out of all of this? The complaint does not specify a dollar amount, while noting the billions in increased valuation that has accrued to OpenAI and Microsoft as a result of ChatGPT. However, it asks for statutory and compensatory damages, “restitution, disgorgement, and any other relief that may be permitted by law or equity” as well as destruction of all LLM models incorporating New York Times’ content, plus, of course, costs. If the Times gets its way, this will be a huge setback for AI development as well as for OpenAI and Microsoft, but of course it may not come to that. The complaint notes that the Times had tried to reach a licensing deal with the defendants. OpenAI cried foul, expressing “disappointment”, and noting that they had been having “productive” and “constructive” discussions with the Times over licensing content. However, to me this is a bit like stealing the cookies, getting caught red-handed and offering to negotiate to pay for them, then crying foul when your offer is rebuffed. The Times has just massively upped the ante, making the potential licensing fees much more valuable.
The irony is that the use of NYT material by OpenAI or indeed other platforms like Google or Facebook potentially brings some advantage and drives some business to the Times, while obviously also providing commercial benefits to the AI program, search engines or social media platforms. The real question will be how that proprietary content is used, and how much is paid to use it. A similar issue is being played out in another context, most recently in Canada with Bill C-18 where news media content providers wanted the big platforms (Google and Meta/Facebook) that derive benefit from using or indexing that content to pay for accessing it. The result in Canada was both a standoff and a compromise. Facebook blocked Canadian news content rather than pay for it, while Google agreed to create a fund for access by the news media in return for being exempted from the Canadian legislation.
The NYT-OpenAI/Microsoft lawsuit is a different iteration of the same principle. Businesses that gain commercial advantage from using proprietary content of others should contribute to the creation of that content, either through licensing or some other means such as a media fund. The most logical outcome of the Times’ lawsuit is almost certainly going to be a licensing agreement. Given the seemingly unstoppable wave of AI development, meaningful licensing agreements would seem to be the best way to ensure fairness and balance of interests going forward.
A Goliath like the New York Times is in a much better position to make this happen than a disparate group of writers and artists. Indeed, there are logistical challenges in being able to license the works of tens of thousands of content creators. In an earlier blog post, I postulated that perhaps copyright collectives might find a role for themselves in this area in future. In my view, ultimately the only logical solution to the conundrum of respecting rights-holders while facilitating the development of AI is to find common ground through fair and balanced licensing solutions. The wrestling giants of the NYT and Microsoft may help show the way.