AI Hallucinations – Hugh Stephens Blog

Delegating Research to AI is a Risky Proposition: The “Hallucination” Phenomenon (User Beware)

A graphic showing a cartoon robot head on a computer monitor with the text 'The World is Flat Because I Say So' in a playful font.

Image: Shutterstock (author modified)

It seems everyday new applications and new threats emerge from the AI world. This applies in particular to creators who see growing AI challenges to their livelihoods; graphic art and album covers spat out by AI generators; voice actors replaced by AI clones; authors struggling to make their works known in a sea of AI-generated slop; now AI artists are even making the Billboard charts. At the same time, AI has many other functions and produces a host of products that have little to do with artistic creation. In particular, it can be used as a crutch to assist and enable research in a wide range of fields. Today it is routinely used by everyone from school kids to law firms to health care researchers. And that is where the risks of mainlining AI are the most evident, because of the propensity of AI platforms to fabricate plausible sounding misinformation.

In a blog post earlier this year (AI’s Habit of Information Fabrication (“Hallucination”): Where’s the Human Factor?) I discussed some examples of law firms caught submitting non-existent case precedents in court as a result of sloppy legal research using AI. Judges have very limited tolerance for this practice, which wastes valuable court time, and they are increasingly imposing significant penalties—that is, if the fabricated information is actually spotted. The problem is not going away. This website maintained by Paris-based legal scholar Damien Charlotin has compiled a database of more than 550 legal cases in 25 countries where generative AI has produced hallucinated content. These are typically fake citations, but also include other types of AI-generated arguments. The US wins the lottery at 373 cases, but Canada is second with 39. Even Papua-New Guinea has one case.

As you can well imagine, AI hallucinated results in health care could be fatal. As Dr. Peter Bonis, Chief Medical Officer at Wolters Kluwer Health points out, hallucination in the health care field has led to various consequences such as recommending surgery when it was not needed, advising that a specific drug could be safely stopped abruptly when this was known to be dangerous, avoiding recommending vaccinations based on known allergies even though it was safe to do so, proposing wrong starting treatments for patients with rheumatoid arthritis and so on. You get the picture. You don’t want your family doc using AI search for the remedy for whatever ails you. The fact that the models present incorrect information with such confidence, and that potentially dangerous incorrect information is embedded with a lot of correct information, makes proper use of AI outputs particularly challenging.

How is it that AI platforms consistently produce unreliable results? This MIT Sloan article identifies three elements;

Training data sources (the uneven quality of inputs, including pirated, biased and otherwise unreliable content)
Limitations of generative models (generative AI models are designed to predict the next word or sequence based on observed patterns and to generate plausible content, not to verify its accuracy)
Inherent Challenges in AI Design (The technology isn’t designed to differentiate between what’s true and what’s not true)

This is all pretty concerning if people are going to surrender personal judgement to AI and use it to cut corners without verification. One way to address part of the problem is to ensure the training data used is reliable and of high quality. That is where licensing of accurate, curated data and content as training inputs is important and that is why a licensing market is developing as AI companies seek out better quality data to distinguish their product from that of their competitors. This can be very helpful where the AI platform is limited to discrete areas of knowledge, such as in the medical field for example, where usage can be limited to professionals who are prepared to pay for a bespoke AI product and who are qualified to interpret the results properly. AI for the general public is another matter, and this is where most of the problems arise. Unfortunately, while improving the quality of training data helps reduce hallucinations, it does not completely eliminate them. As the New York Times has reported,

“Because the internet is filled with untruthful information, the technology learns to repeat the same untruths. And sometimes the chatbots make things up. They produce new text, combining billions of patterns in unexpected ways. This means even if they learned solely from text that is accurate, they may still generate something that is not.”

User beware. Nonetheless, better inputs lead to better outputs. As AI developers work to take their products to the next level by refining their training processes and making outputs more predictable and trustworthy, they will need access to curated, proprietorial content and closer collaboration with content owners. Dr. Bonis noted that for specialized areas like health care, AI companies will get better quality feedstock while creators of the content will receive funding allowing them to continue research. A virtuous circle.

Users bear a big responsibility to ensure AI is employed effectively. The mindless, unjudgemental use of AI to reach conclusions in areas where the user has little knowledge can be dangerous. By all means use AI as a tool to sort and categorize, but don’t rely on it to produce the answers on which substantive decisions will be based. Any sensible user of AI has a pretty good idea of the answer to the question before it is even asked. It is also a good idea to refine the question, so you narrow the range of possibilities.

Some proprietary AI models offer RAG (Retrieval Augmented Generation) where the AI will retrieve relevant information from trusted sources to supplement its preliminary analysis. This can increase reliability. However RAG, where the AI goes after specific inputs to bolster its results, can also expose AI developers to charges of copyright infringement, as is currently the case with Canadian AI company, Cohere, which is being sued by a number of newspaper publishers, including the Toronto Star, for copyright infringement. As Canadian lawyer Barry Sookman has pointed out in a recent blog, use of RAG can create risk for the AI platform. In the case of Cohere, when its RAG feature was switched on, it reproduced large amounts of almost verbatim text pulled directly from the litigating news sources. But if the RAG function was switched off, it produced fabricated information (hallucinations) yet still identified this false information as coming from an identified reliable news source, leading to charges of trademark dilution. The value of the brand was diminished by the attribution of false information to it. This trademark dilution issue is also part of the New York Times case against OpenAI.

At the end of the day, it is a case of user beware as a recent case in Newfoundland demonstrates well. The Government of Newfoundland commissioned an in-depth study on the future of education in the province. The 410 page report containing over 110 recommendations, authored by two university professors, was released with great fanfare at the end of August. No doubt a great deal of careful research had gone into producing the study over the 18-month production period. But then cracks started appearing in the edifice. It was chock full of made-up citations. The more people started checking, the more they found. The Department of Education and Early Childhood Development tried to whitewash the issue by saying it was aware of a “small number of potential errors in citations” in the report. But even one fabricated citation is one too many! If you search for the report online now you get the classic “404 Not Found” message. A lot of work has potentially gone down the drain, and possibly the credibility of two academics has been destroyed by careless use of AI. This is a cautionary tale that I have no doubt will be repeated.

In fact, it was repeated just a few days later. It seems Newfoundland is particularly prone to victimization by hallucinating AI platforms. After the education report debacle, new reports have surfaced that a $1.5 million study on the health care system conducted by none other than Deloitte also contains fabricated information included made up references. The opposition party is demanding the government insist on a refund.

In our rush to embrace AI, many seem to have forgotten the value of human creativity and judgement. Coming back to the creative industries and AI, some of those whose livelihoods may be threatened by this new phenomenon are bravely trying to find a silver lining. Some voice actors are generating an additional revenue stream by licensing their voice clips for AI training, and many graphic artists use AI as an assist. Are they putting themselves out of work in the long run or are they simply adapting? The jury is still out, but the generally low quality of AI produced art, music and literature, as well as the ongoing problem of hallucination, suggests that there will always be a need for real human input. Anyone planning on substituting AI for “real work” had better think again.

AI’s Habit of Information Fabrication (“Hallucination”): Where’s the Human Factor?

An illustration of a cartoonish robot face on a computer screen with the text 'THE WORLD IS FLAT' above it.

Image: Shutterstock (with AI assist)

It is well known that when AI applications can’t respond to a query, instead of admitting they don’t know the answer, they often resort to “making stuff up”—a phenomenon commonly called “hallucination” but which should more accurately be called for what it is, total fabrication. This was one of the legal issues raised by the New York Times in its lawsuit against OpenAI, with the Times complaining, among other things, that false information attributed to the journal by OpenAI’s bot undermined the credibility of Times journalism and diminished its value, leading to trademark dilution. According to a recent article in the Times, the incidence of hallucination is growing, not shrinking, as AI models develop. One would have thought that as the models ingest more material, including huge swathes of copyrighted and curated material such as content from reputable journals like the Times (without permission in most instances), its accuracy would improve. That doesn’t seem to be the case. Given AI’s hit and miss record of accuracy, it should be evident that AI output cannot be trusted or, at the very least, can only be trusted if verified. Not only is AI built on the backs of human creativity (with a potentially disastrous impact on creators unless the proper balance is struck between AI training and development, and the rights of creators to authorize and benefit from the use of their work), but human oversight and judgement is required to make it a useful and reliable tool. AI on auto-pilot can be downright dangerous.

The most recent outrageous example of AI going astray is the publication by the Chicago Sun-Times and Philadelphia Inquirer, both reputable papers (or at least they used to be), of a summer reading list in which only five of fifteen books listed were real. The authors were real but most of the book titles and plots were just made up. Pure bullshit produced by AI. The publishers did a lot of backing and filling, pointing to a freelancer who had produced the insert on behalf of King Features, a unit of Hearst. Believe it or not, it was actually licensed content! That freelancer, reported to be one Marco Buscaglia, a Chicago “writer”, admitted that he had used AI to create the piece and had not checked it. “It’s 100% on me”, he is reported to have said. No kidding. Pathetic. Readers used to have an expectation that when a paper or magazine published a feature recommending something, like a summer reading list, the recommendation represented the intellectual output of someone who had done some research, exercised some judgement, and had presumably even read or at least heard about the books on the list. How could anyone recommend non-existent works? The readers trusted the newspaper, the paper trusted the licensor, the licensor trusted the freelancer, the so-called author. Nobody checked. Where was the human element? The list wasn’t worth the paper it was printed on.

The same problem of irresponsible dependence on unverified information produced by AI is a growing problem in the legal field. Prominent lawyer and blogger Barry Sookman has just published a cautionary tale about the consequences of using hallucinatory AI legal references. Counsel for an applicant in a divorce proceeding in Ontario cited several legal references using the CanLII database (for more information on CanLII see “AI-Scraping Copyright Litigation Comes to Canada (CANLII v Caseway AI”) that the presiding judge could not locate—because they did not exist. He suspected the factum had been prepared using Generative AI and threatened to cite the lawyer in question for contempt of court, noting that putting forward fake cases in court filings is an abuse of process, and a waste of the court’s time. The lawyer in question has now confirmed that AI was used by her law clerk, that the citations were unchecked, and has apologized, thus avoiding a contempt citation. Again, nobody checked (until the judge went to the references cited).

This is not even the first case in Canada where legal precedents fabricated by AI were presented to a court. Last year in a child custody case in the BC Supreme Court, the lawyer for the applicant was reprimanded by the presiding judge for presenting false cases as precedents. The fabricated information was discovered by the defence attorneys when they went to check the applicant’s lawyer’s arguments. As a result, the applicant’s lawyer was ordered to personally compensate the defence lawyers for the time they took to track down the truth. The perils of using AI to argue legal cases first came to prominence in the US in 2023 when a New York federal judge fined two lawyers $5000 each for submitting legal briefs written by ChatGPT, which included citations of non-existent court opinions and fake quotes.

Another area fraught with consequences for using unacknowledged AI generated references is academia. The issue extends well beyond undergraduate student essays being researched and written by AI to include graduate students, PhD candidates and professors taking shortcuts. This university library website, in its guide to students on use of AI generated content, notes that LLMs (Large Language Models used in AI) can hallucinate as much as 27% of the time and that factual errors are found in 46% of the output. The solution is pretty simple. When writing a research paper, don’t cite sources that you didn’t consult.

This brings up the question of “you don’t know what you don’t know”. If your critical faculties are so weak as to not be able to detect a fabricated response, you are in trouble. Of course, some hallucinations are easier to spot than others. Some of the checking is to simply verify that a fact stated in an AI response is accurate or that a cited reference actually exists (but then it should be read to determine relevance). In other cases, it may be more subtle, with the judgement and creativity of the human mind being brought into play to detect a hallucination. That requires experience, knowledge, context—all of which may be lacking in the position of a junior clerk or student intern assigned the task of compiling information. This is all the more reason why it is important for those using AI to check sources, and to exercise quality control. Part of the process is to ensure transparency. If AI is used as an assist, that should be disclosed.

At the end of the day, AI depends on human creativity and accurate information produced by humans. Without these inputs, it is nothing. This brings us to the fundamental issue of whether and how copyright protected content should be used in AI training to produce AI generated outputs.

The US Copyright Office has just released a highly anticipated study on the use of copyrighted content in generative AI training. Here is a good summary produced by Roanie Levy for the Copyright Clearance Center. The USCO report is clear in stating that the training process for AI implicates the right of reproduction. That is not in doubt. It then examines fair use arguments under the four factors used in the US. Notably, with respect to the purpose and character of the work used for training, USCO notes that the use of copyrighted content for AI training may not be transformative if the resulting model is used to generate expressive content or potentially reproduce copyrighted expression. It notes that the copying involved in AI training can threaten significant harm to the market for, or value of, copyrighted works especially where a model can produce substantially similar outputs that directly substitute for works used in the training data. This report is not binding on the courts but is a considered and well researched opinion by a key player.

It is interesting to note that the report was released quickly in a pre-publication version on May 9, just a day before the Register of Copyrights (the Head of the Office) Shira Perlmutter was dismissed by the Trump Administration and a day after the Librarian of Congress, Carla Hayden (to whom Perlmutter reports) was fired. Washington is rife with speculation on the causes for, and the legality of, the dismissals. We will no doubt hear more on this. With respect to fair use in general, the study concludes that “making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets…goes beyond established fair use boundaries”. The anti-copyright Electronic Frontier Foundation (EFF), of course, disagrees. (Which probably further validates the USCO’s conclusions).

The USCO study is about infringement, not hallucination or fabrication, yet both stem from the indiscriminate application and use of AI where the human factor is largely ignored and devalued. Human creativity and judgement is needed to set guardrails on both. Transparency as to what content has been used to train an AI model, along with licensing reputable and reliable content for training purposes, are important factors in helping AI to get its outputs right. Not taking an AI output as gospel but applying a degree of diligence, common sense, fact verification or experienced judgement are other important factors in deploying AI as it should be used, as an aide and assist to make human creativity and human directed output more efficient but not as a substitute for thinking or original research. Generative AI must be the servant, not the master. Human creativity and judgement are needed to ensure it stays that way.

Tag: AI Hallucinations

Delegating Research to AI is a Risky Proposition: The “Hallucination” Phenomenon (User Beware)

Like this:

AI’s Habit of Information Fabrication (“Hallucination”): Where’s the Human Factor?

Like this:

Share this post:

Like this:

Share this post:

Like this: