Understanding and Avoiding Hallucinated References: An AI Writing Experimentá

Ronald Cole (University of Cincinnati)
Lauren Maher (Texas Tech University)
Rich Rice (Texas Tech University)

This assignment invites students to critically examine AI hallucinations—false or fabricated content generated by AI—with a focus on academic references. Through guided experimentation, students use generative AI tools to compose texts and reference lists, then to evaluate citation factual validity. The activity makes visible how easily AI can produce convincing but incorrect information, especially in scholarly contexts, highlighting the need to verify for accuracy. Rather than discouraging AI use as sometimes occurs, the assignment invites students into a guided process of inquiry that positions them as co-researchers in an evolving technological landscape. By engaging in prompt engineering and verification techniques, students gain practical strategies for detecting and minimizing hallucinations, thereby enhancing their digital and research literacies.


Learning Goals

  • Use AI experimentation to explore questions of writing, authorship, and technology.
  • Identify and verify hallucinated references in AI-generated texts.
  • Apply prompt engineering strategies to improve citation accuracy.
  • Reflect on ethical, rhetorical, and epistemological implications of AI in research and writing.
  • Develop critical digital literacy practices for responsible AI integration.

Original Assignment Context: Research writing courses

Materials Needed

  • A computer or similar device with access to the internet and a web browser.
  • Access to generative AI tools such as OpenAI’s ChatGPT. While ChatGPT currently has a free version (https://chat.openai.com), other tools/chatbots such as Google’s Gemini or Microsoft’s Copilot may require the creation of an account (https://gemini.google.com and https://copilot.microsoft.com).
  • This multipage worksheet along with a word processing program that can edit and save the file such as Microsoft Word or Google Docs.
  • Helpful Knowledge:
    • CTRL-C (PC) or CMD-C (MacOS) to copy
    • CTRL-V (PC) or CMD-V (MacOS) to paste
    • CTRL-F (PC) or CMD-F (MacOS) to find

Time Frame: ~1 week 

Overview: This assignment is a scaffolded, inquiry-based writing activity designed to help students recognize and address one of the most common issues in AI-assisted research and writing: hallucinated references. A hallucination can be defined as an AI-generated response that is false—particularly in the form of fabricated citations or sources. As generative AI tools become more integrated into academic and professional writing, the ability to verify claims, evaluate sources, and craft accurate prompts is increasingly critical. Students are asked to generate AI-based writing with citations, then to verify the legitimacy of those references using scholarly databases and web searches. Through this process they learn how to detect and flag hallucinated content as well as how to improve prompts using strategies like retrieval-augmented generation (RAG) and prompt specification. After completing the assignment, students report greater awareness of limitations of LLMs, more confidence in verifying sources, and improved prompting skills. They also demonstrate a more nuanced understanding of the ethics and mechanics of AI-assisted writing. The assignment contributes to broader goals of fostering responsible, rhetorically aware AI use in academic and professional settings.


Assignment

Hallucinations are an obstacle to the usability of AI systems (Kalai & Vempala, 2024). While different definitions for the term exist such as “AI-generated content deviating from factual correctness” (Maleki, Padmanabhan, & Dutta, 2024) and “mistakes in the generated text that are semantically or syntactically plausible but are in fact incorrect or nonsensical” (Smith, 2023), this assignment uses a far simpler definition for a hallucination: an AI response that is false.

Why Identifying Hallucinations Matters

As the use of generative AI becomes increasingly common in writing, research, and professional communication, it is critical to develop the ability to identify and manage hallucinated content. Hallucinated references can mislead readers, undermine the credibility of your work, and even contribute to the spread of misinformation. In academic and professional contexts, unverified or inaccurate citations may result in serious ethical and reputational consequences. Learning how to verify sources, evaluate claims, and prompt AI tools responsibly is an essential skill for researchers and writers who plan to integrate AI into their workflow. These practices not only ensure the accuracy of your work but also promote a deeper understanding of how language models operate.

How Hallucinations Happen (and Why some LLMs are Better than Others)

AI hallucinations occur when large language models (LLMs) generate outputs that are plausible-sounding but false. There are a number of reasons why this may occur, two of which are faulty training data and incorrect predictions. Hallucinations related to faulty training data are those where the AI model was “taught” from material that was in some way wrong, thus resulting in responses related to that material also being wrong. An AI model trained on a paper that states “Abraham Lincoln was the first President of the United States” (he was actually the 16th) might generate responses with that false information. Further, hallucinations related to incorrect prediction may also occur. These happen because LLMs use prediction related to the prompt as well as the vast amounts of data they have been trained on to generate the next word in a sequence—not to verify truth. When asked for citations or facts, the AI may “fill in the blanks” with patterns it has seen before rather than retrieving the information from verified sources.

Some newer AI tools have the ability to incorporate information from external sources such as those with real-time updates or expanded databases. This ability, called retrieval-augment generation (RAG), has been shown to help decrease AI hallucinations (Lewis et al., 2020; Shuster et al., 2021). For example, tools like Research Rabbit or Elicit integrate live citation data or academic graph exploration, reducing the chance of fabricated references. By contrast, AI models that operate solely on pre-trained knowledge without live Internet retrieval (like some versions of ChatGPT or Gemini) are more prone to hallucination in citation tasks.

One area of AI-assisted writing where hallucinations are especially common is references and other cited works (Aljamaan et al., 2024; Athaluri et al., 2023). When asked to generate references for a paper, AI tools will often create entries that appear plausible but do not actually exist or were not used to develop the text. These references may have details including authors, titles, publication dates, and potentially even page numbers, but further investigation will show they are not real or were not actually used. The following activity explores hallucinations to determine how they may be both detected and decreased in your own research writing.

Part 1. Exploring Basic Text and Reference Generation

Prompt the AI tool initially to generate a statement of need related to the issue of homelessness. Copy/paste the prompt below. The statement of need (sometimes called a problem statement) is the section of a grant proposal where the significance of the issue the funded project will address is explained.

Note: A more advanced prompt that includes numerous specific details about the project and the applicant would generally be recommended (flipped interaction prompts, where the AI asks users to supply needed information, work well for this).

Prompt: Write a statement of need for an organization seeking a grant for $50,000 to renovate a church and turn it into a 200-bed homeless shelter. Include citations that help show the need. (Note: Do not use live Internet search. Only use references based on the AI’s internal training data, i.e., what it has learned from general sources during its initial development—not new or current information pulled from live sources.) Include them in a list of references. Include author names and dates for the references.

The AI tool will generate writing on this subject as well as usable references. If not, either prompt the AI tool to rewrite the section including citations and an appropriate reference section or reset the AI chat by closing the web browser and starting a new conversation with the previous prompt. Next, attempt to verify two of the generated references. This includes confirming the existence of the cited source and substantiating the content.

Confirm the Existence of the Cited Source. At this point, we want to locate the source to confirm that it is real. They all look real, but are they? A confirmed source will have the same author, title, year, and other elements as the one in the reference section. There are multiple methods for confirming validity, for instance:

  • Internet search engine: Copy and paste the entirety of a single reference into an Internet search engine and look through the results.
  • Specialized search engine: Use Google Scholar or a university library database to search the reference.
  • Publisher archive: Visit the journal’s or publisher’s website and search for the reference.

Example: A reference of “Hudson, S. (2023). Annual Homeless Assessment Report to Congress. U.S. Department of Housing and Urban Development” exists by title and year, but the listed author is incorrect. This makes the reference hallucinated.

Substantiate the Content. Just because a reference exists does not guarantee that its contents actually support the claim being made. Content of the source should be checked to make sure the cited information actually came from the reference.

  • Check the text: Review for relevant points.
  • Check the subject: If the subject matter doesn’t match, it may be hallucinated.
  • Ask the AI to verify: Prompt your AI tool to confirm the citation/page. Less reliable, but a possible strategy.

Example: A citation to Karsh & Fox (2009) on public health and COVID-19 is a hallucination. The book (one on grant writing) predates the pandemic and does not discuss disease exposure.

Note: A Digital Object Identifier (DOI) is an alphanumeric string that can be assigned to a publication to help recognize and locate it. Combined with the DOI domain, the identifier often forms a working link to the digital reference. As well, the nonprofit DOI Foundation allows website users to look up a reference’s title and authors based on this; other sites such as Crossref allow for the DOI to be located based on the title.

Determine the Status of Your References. Paste two of the references you gathered below where you see XXX and then put a note next to them in parenthesis as to whether they are (Valid), (Suspicious), or (Hallucinated). Write a paragraph for each explaining your observations and the results.

Part 2. Prompt Engineering to Decrease and Detect Hallucinations

Two strategies are emphasized here: RAG and Specification.

  • Retrieval-Augmented Generation (RAG): Prompts that instruct the AI tool to consult external sources help reduce hallucinations (if the tool has that feature). Example phrases: “Use scholarly sources,” “Consult Wikipedia,” “Use PubMed articles.”
  • Specification: Adding precise requirements (specifications) to prompts helps to create verifiable output. Examples: asking for APA formatting, requiring DOI numbers, or requesting live links.

Copy/paste the prompt into your AI tool.

Prompt: You are an experienced researcher. Create a list of ten important articles on modern weight loss drugs using scholarly sources such as those found in PubMed. The information for each article should be listed in APA format and include a link or DOI number if possible.

Reflection: How did the AI tool respond? Did it pause to indicate a web search? Were the references formatted and linked correctly? Write a paragraph or two explaining your observations.

Verification: Next, check at least two of your references using DOI and other verification methods. Record your findings.

Part 3. Generating Your Own Reference Prompt

Now it’s your turn. Create a detailed prompt for generating references on a research topic of your choosing. Use what you have learned about specificity and retrieval cues. Paste the prompt below, run it in your AI tool, and discuss the results.

Reflection: What surprised you most in this process? How might your thinking about citation practices shift because of this activity? Do you feel more or less confident in your ability to evaluate digital sources, and why? Write a paragraph or two explaining your observations. 

References

Aljamaan, F., Temsah, M., Altamimi, I., Al-Eyadhy, A., Jamal, A., Alhasan, K., Mesallam, T. A., Farahat, M., & Malki, K. H. (2024). Reference hallucination score for medical artificial intelligence chatbots: Development and usability study. JMIR Medical Informatics, 12, e54345. https://doi.org/10.2196/54345

Athaluri, S. A., Manthena, S. V., Kesapragada, V. K. M., Yarlagadda, V., Dave, T., & Duddumpudi, R. T. S. (2023). Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus, 15(4). https://doi.org/10.7759/cureus.37432

Kalai, A. T., & Vempala, S. S. (2024). Calibrated language models must hallucinate. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing (pp. 160–171). https://doi.org/10.48550/arXiv.2311.14648

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.T., Rocktäschel, T., Riede, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. https://dl.acm.org/doi/abs/10.5555/3495724.3496517

Maleki, N., Padmanabhan, B., & Dutta, K. (2024). AI hallucinations: A misnomer worth clarifying. In 2024 IEEE conference on artificial intelligence (CAI) (pp. 133–138). IEEE. https://doi.org/10.1109/CAI59869.2024.00033

Shuster, K., Poff, S., Chen, M., Kiela, D., & Weston, J. (2021). Retrieval augmentation reduces hallucination in conversation. arXiv. https://doi.org/10.48550/arXiv.2104.07567

Smith, C. S. (2023). Hallucinations could blunt ChatGPT’s success. IEEE Spectrum. https://spectrum.ieee.org/ai-hallucination


Acknowledgements

We thank Texas Tech University’s Teaching, Learning, and Professional Development Center (https://www.depts.ttu.edu/tlpdc) for support with AI-enhanced teaching techniques training.

This experiment works well in conjunction with Anna Mill’s “Fact-Checking Auto-Generated AI Hype” from the January 2024 release of TextGenEd: Continuing Experiments.

A copy of the earlier version of this lesson as used in worksheet form may be downloaded for use.