The Great Scoop: Feeding Google’s AI Machine


Oh Google! You’ve done it again! You have taken a good idea—one that could help creativity–and once again blotted your copybook by antagonizing the creative community you profess to serve. Yet again you have turned a blind eye to the rights of writers and creators to serve your own ends, all in the name of “progress”. First it was the digitizing of all those books in public libraries—without the approval of the authors of those works. Then, once the works were digitized you indexed them by putting up large excerpts on the web for free use without as much as a “by your leave” to the copyright holders. While this could be seen as somewhat akin to a digital version of flipping through a book in a library, looking at excerpts, there is a big difference because a) the library has purchased the book and b) the library’s users are all pre-qualified in some way, as residents or students for example. I know that this “transformative use” has been ruled to be within the four corners of fair use by the US courts, but it doesn’t mean that it is morally right.

The argument in support of posting digital excerpts supposes that this does not unreasonably compromise the rights of the authors since the availability of the excerpts might prompt readers to acquire the work. Well, maybe. In my own case, I have seen it work both ways. In doing an Internet search, I came across the key words I was looking for in a Google book excerpt. But lo and behold, the excerpt contained exactly enough information to satisfy my needs, and I went no further. In another case, the excerpt whetted my appetite and prompted me to do a deeper dive, and I eventually purchased access to the book. I am not sure what percentage of the reading public is now buying more, and what portion is buying less as a result of Google’s online excerpts, and I guess there is no sure way to find out. The only thing that is certain is that book sales are dropping.

Now Google is using OPC (other peoples’ content) in new ways, this time in the realm of Artificial Intelligence (AI). The Guardian has reported that to help develop their AI product, researchers at Google have scooped up thousands of pages of novels found on the Internet, allegedly by “unpublished authors” and fed them into their system so as to teach their bots the logical and customary use and sequence of language. The works were drawn from a collection of 11,000 books known as the “Books Corpus” assembled by a group of academic researchers who accessed most of them from Smashwords. How the authors of the works used to train the bots could be described as “unpublished” is hard to understand since the raison d’etre of Smashwords is to publish independent writing. The fact that the free works were available on Smashwords is prima facie evidence that they were published, not to mention the fact that some of the works included specific copyright declarations. The creators of the “Books Corpus” bear some of the responsibility for disrespecting author’s rights, but Google has compounded it. And they are not apologetic. According to the Guardian, a Google spokesperson responded with “It doesn’t harm the authors and is done for a very different purpose from the authors’, so it’s fair use under US law.”

I am not against AI, which is an inevitable next step to improve the automated algorithmic processes we have already. However, in pushing into new areas Google has once again appropriated the content of others to serve its own ends. The company seems to have a real blind spot when it comes to the rights of creators. It is just possible that they don’t even realize how disrespectful they are toward the creative community, quite apart from questions of possible infringement. In Google’s eyes if it is accessible, it is fair game. There is no question that Google is creative and innovative. It has revolutionized search, mapping, data mining etc. It is unfortunate that while they stand on the shoulders of those who have preceded them, they seem unable to respect (and share the wealth) with those on whose work they are building. Where is the mutual respect that those creative software developers in Google have for creators in other fields?

The conundrum is that I personally like Google products. In my humble opinion, Google search has its competitors beaten hands down. This was confirmed during a recent visit to China where I was forced to use the search engines of the competition because the Great Firewall blocks Google search, Gmail and YouTube . I found the competing search engines to be a pale shadow of Google’s product. Google has even brought us a new word that has found its place in the Oxford English Dictionary—“to google”. This was brought home to me when my daughter said, “Dad—I googled you, and you’re famous!” Not exactly, but like many who blog or write, an Internet search will turn up a number of Hugh Stephens’ references—including some from a nominal doppelganger in Australia. Incidentally that same daughter, and her older sibling, have created, developed and successfully marketed a board game based on algorithmic-predicted Google search results, a game they call “Query”.

After that shameless plug, let get back to the main point; as a creative and innovative as they are, the folks at Google just don’t get it. Perhaps respect for privacy (as in Google streetscape) or copyright (as in this latest episode of appropriating someone else’s content for their own ends) is just not in their DNA. When one thinks of words to describe Google, terms like “innovative”, “high quality” and “user-friendly” come to my mind. But so do “over-reach”, “disrespect” and “entitlement”. Does Google care? Probably not.

These negatives are the inevitable by-products of Google’s hubris and well-established modus operandi of never asking permission but instead, if absolutely necessary, seeking forgiveness after the fact. The latest scoop of content to feed its AI machine is a further example, if one was needed. It is a shame that such a creative company has such disregard (some would say contempt) for the creativity of others.

© Hugh Stephens 2016. All Rights Reserved.


Author: hughstephensblog

I am a former Canadian foreign service officer and a retired executive with Time Warner. In both capacities I worked for many years in Asia. I have been writing this copyright blog since 2016, and recently published a book "In Defence of Copyright" to raise awareness of the importance of good copyright protection in Canada and globally. It is written from and for the layman's perspective (not a legal text or scholarly work), illustrated with some of the unusual copyright stories drawn from the blog. Available on Amazon and local book stores.

3 thoughts on “The Great Scoop: Feeding Google’s AI Machine”

Leave a Reply

%d bloggers like this: