Mini Audits: Analyzing GenAI Outputs to Track Systemic Priorities & Proclivities

Kirkwood Adams & Maria Baker
Columbia University

This assignment asks students to conduct their own Mini Audit of a text or image generation model and evaluate the model’s potential as a collaborator. Adapting methods from researchers like Danaë Metaxa and Joy Buolamwini, auditing positions genAI systems in and of themselves as valuable objects for inquiry and scrutiny. Through a sequence of productive steps, students generate small datasets and subject them to critical thinking. By experiencing outputs in aggregate rather than in isolation, students can consequently reconsider the effectiveness and value of individual outputs. 


Learning Goals

  • Develop the ability to track patterns across a dataset of outputs & make observations of individual outputs to produce a critical reading of the content these systems create. 
  • Develop an understanding of genAI outputs as part of a larger content economy by looking at outputs in aggregate.
  • Practice evaluating a particular model’s potential as a tool or collaborator in specific contexts.
  • Practice more deliberate prompting strategies and prompting language that work to overcome a model’s defaults (if the writer chooses to collaborate).

Original Assignment Context: “AI in Context” is an interdisciplinary course now offered by Columbia University for students to proactively study AI from a variety of disciplinary viewpoints. The inaugural class was taught in person with an enrollment of 70+ students and featured a series of modules led by faculty from across disciplines, from Computer Science to Philosophy and Music. We lead a module dedicated to thinking through genAI and writing. The Mini Audit was the module’s final assignment (worth 15% of students’ course grade).

Materials Needed: Students need access to an image or text generation model of their choosing and their individual copy of the Mini Audit form.

Time Frame: Students had one week to complete the assignment.

Overview: Inspired by the methods and purposes of ethically motivated computer science researchers like Danaë Metaxa and Joy Buolamwini, we have developed various 'auditing' methods for the classroom, which treat genAI systems in and of themselves as valuable objects for inquiry and scrutiny. Auditing relies on a central move that is highly adaptable to different teaching contexts: querying generative AI models systematically and studying the outputs to probe and characterize the machine intelligence that constructed those outputs.

????Machine intelligence often 'thinks' in terms of templates. This proclivity can manifest in genAI outputs that are more homogenous than the myriad possibilities any input could yield. While this template thinking can be useful or appropriate, it also flattens the spectrum of possibilities and erases experiences. Literate users of generative AI are warned and compelled to monitor these systems for their limitations. But how?

One way to get a glimpse of a system's defaults is to assign students to conduct their own Mini Audit of a system's outputs. By assembling a dataset of outputs, students can observe the variety (or lack thereof) across the data. By experiencing outputs in aggregate rather than in isolation, students can consequently reconsider the effectiveness and value of individual outputs.

In the Fall of 2024, we created and used this assignment as the concluding exercise for the AI and Writing module of an interdisciplinary lecture class called “AI in Context”. Our students' projects yielded evidence of insightful reflection across a range of responses: that genAI propagates biases, unwittingly violates copyright, spreads misinformation, and erases and replaces their purposes or subjectivities.


Assignment

Below we present the assignment’s instructions as we provided them to students. Given the interdisciplinary approach and cohort of “AI in Context,” the directions prompt students to leverage their own interests/expertise in a fairly open-ended fashion. The instructions begin with a general overview of the mini-audit as a task and then continue with a granular sequence of steps guiding students to fulfill the task. 

While the sequence of steps is fundamental, the scope is adaptable. The target of the audit could be more heavily directive and narrowed by unique disciplinary concerns of any particular class (either determined in advance by instructors or nominated by class cohorts collaboratively). A class could decide to query a single model or restrict prompts to a subject matter the students have built expertise in. For example, in a themed writing class with a focus on environmental science, students could collectively investigate images produced by ChatGPT in response to the input “climate disaster.”

For any instructor hoping to adapt and replicate this assignment, offering readings to their students that clarify the project, purpose, and outcome of auditing algorithmic systems will be highly illustrative. Across the various classes and contexts in which we’ve experimented with mini-auditing, we’ve consistently shared: “Humans Are Biased. Generative AI Is Even Worse.” by Nicoletti and Bass, and “An Image of Society: Gender and Racial Representation and Impact in Image Search Results for Occupations” by Metaxa et al. These texts also demonstrate how making detailed observations of a single outpost alongside outputs-in-aggregate is fundamental to the auditing method.

THE MINI AUDIT

Task: conduct and reflect on a Mini Audit of a generative AI system. Use the step-by-step guide below to document the process.

  • Create a small dataset of outputs from a generative AI system.
  • Ask an A.I. system like ChatGPT or Dall-e to perform a task: either visualizing an image or writing text.
  • Repeatedly and systematically prompt with the same input in order to gather a collection of responses.
  • Observe each individual output and collection of outputs as a whole.
  • Draw inferences about how machine intelligence “thinks” in response to this task.
  • Write a short 200-word response that reflects on the results of your Mini Audit.

Step-by-step guide to complete:

  1. Choose a model to audit. (image or text generator) [ChatGPT 4o, Dall-E, Claude, paid, free, etc.]
  2. Consider your own expertise (disciplinary, lived experience). Since the goal is to make systematic observations of the outputs, brainstorm about possibilities for “tasks” you feel you have the capacity to judge, given your existing knowledge.
  3. Based on the above, choose a target for the audit. The target could be a subject, concept, topic, or object of analysis (e.g., a film, a book, a scenario), and devise a simple prompt.

Examples—

For image:  Show me “a birthday party,” “a college student studying,” “an urban landscape,” “a doctor at work,” “breakfast,” “a wedding,” “a protester,” “human colonization of mars,” etc.

For text: “Who is better, Lebron James or Kobe Bryant?” “Tell me 10 facts about Taylor Swift.” “Write a short review of HBO’s new prestige drama, Penguin.” “Is Thundercat the Beethoven of the 21st Century?” “What must I see when visiting Iceland?” (This can work if you’ve actually been to Iceland.)

  1. How would you answer the prompt? Before you query the system, note what you would say/draw/photograph.
  2. Before you run the prompt, also take a moment to jot down what you expect to see in the model’s response.
  3. Clear the model’s memory, start a new chat/session, and run the prompt. Once.
  4. Make observations of the FIRST output here. Describe what you notice. Make observations about form as well as content, about what is foregrounded, and about smaller details. How does the output align with your expectations?
  5. Now, create a larger dataset, and rerun the prompt several times. (ca.10-15). *COLLECT ALL OUTPUTS in the APPENDIX AT END OF YOUR AUDIT DOC*
  6. Looking at the outputs in aggregate, what do you notice? Make observations here.
  7. Devise a prompt that might close the gap between what you observe in aggregate and your initial response. How would you have to ask to receive something close to what you initially imagined?
  8. Run the new prompt. (And add the output you receive to the appendix.)
  9. Observe the output. 
  10. Final note: how would you assess the system’s capacity to collaborate and respond in the context you established? What do your observations mean for your collaborative process? (ca. 200 w)

Acknowledge the limits of the experiment. The output can change tomorrow!

APPENDIX: Collect the outputs of your mini-audit below. Copy and paste text, take screenshots—whatever efficient way you can create this catalogue works fine!


Acknowledgements

“AI in Context” was conceived and developed by Adam Cannon, Vishal Misra, and Lydia Chilton. We’re grateful to be part of this larger interdisciplinary teaching experiment.

A copy of an audit worksheet is also available for download here. (https://bit.ly/TextGenEdMiniAudit)