About the Corpus
The Michigan Corpus of Upper-Level Student Papers (MICUSP) is a discipline-rich collection of advanced student academic writing curated between 2004 and 2009 at the University of Michigan. The corpus includes 829 papers, totaling approximately 2.6 million words, with contributions from senior undergraduates and graduate students across four academic levels (senior undergraduate through third-year graduate).
MICUSP encompasses a broad range of academic genres—such as argumentative essays, research papers, reports, proposals, critiques, creative writing, and response papers—and represents an extensive set of disciplines spanning the humanities, social sciences, health sciences, and engineering. Developed by a curation team that includes Ute Römer, Matthew Brook O’Donnell, Annelie Ädel, Rita Simpson-Vlach, and John Swales, the corpus is licensed for non-commercial academic use and is publicly accessible. Texts are available in PDF, XML, and HTML formats.
Accessing the Corpus
The corpus is accessible in multiple formats:
Analyzing the Corpus
MICUSP is well-suited for both large-scale computational studies of advanced student writing across disciplines, as well as smaller-scale exploratory analyses.
-
Basic NLP with Voyant Tools (no coding required): Useful for exploring word frequency, key terms, collocations, lexical diversity, patterns of repetition, and comparisons between papers to identify shifts in emphasis or theme.https://voyant-tools.org
-
Multi-Dimensional Analysis in R: An R package that includes a built-in MICUSP dataset (micusp_biber) with 67 linguistic features tagged using pseudobibeR. It supports multidimensional analysis (factor analysis, visualization of feature distributions) which is useful for exploring stylistic and rhetorical variation across genres/disciplines. https://github.com/browndw/mda.biber
- Part-of-Speech Tagging and Named Entity Recognition with Python: A tutorial that demonstrates how to analyze MICUSP texts using part-of-speech tagging and named entity recognition, enabling studies of grammatical patterns, disciplinary language use, and references to people, institutions, and places. https://programminghistorian.org/en/lessons/corpus-analysis-with-spacy
Selected Research
-
Aull, L. L. (2019). Linguistic markers of stance and genre in upper-level student writing. Written Communication, 36(2), 267–295. https://doi.org/10.1177/0741088318819472
- Aull, Laura L., Dineth Bandarage, & Meredith Richardson Miller. (2017). Generality in student and expert epistemic stance: A corpus analysis of first-year, upper-level, and published academic writing. Journal of English for Academic Purposes, 26, 29–41. https://doi.org/10.1016/j.jeap.2017.01.005
- Aull, Laura L., & Lancaster, Zak. (2014). Linguistic markers of stance in early and advanced academic writing: A corpus-based comparison. Written Communication, 33(1), 151–183.
- Barbara, S. W. Y., Afzaal, M., & Aldayel, H. S. (2024). A corpus-based comparison of linguistic markers of stance and genre in the academic writing of novice and advanced engineering learners. Humanities and Social Sciences Communications, 11, Article 10. https://doi.org/10.1057/s41599-024-02757-4
- Becker, K., & Feng, H.-H. (2020). Stance in unpublished student writing: An exploratory study of modal verbs in MICUSP’s Physical Science papers. In U. Römer, V. Cortes, & E. Friginal (Eds.), Advances in corpus-based research on academic writing: Effects of discipline, register, and writer expertise (pp. 255–278). John Benjamins. https://doi.org/10.1075/scl.95.11bec
- Michigan Corpus of Upper-level Student Papers. (2009). MICUSP [Corpus]. English Language Institute, University of Michigan. varieng.helsinki.fi
- Römer, U., & Wulff, S. (2010). Applying corpus methods to written academic texts: Explorations of MICUSP. Journal of Writing Research, 2(2), 99–127. https://doi.org/10.17239/jowr-2010.02.02.2
- Wang, X. (2021). Hedging in academic writing: Cross-disciplinary comparisons in the Michigan Corpus of Upper-Level Student Papers (MICUSP). In JALTCALL 2021 Conference Proceedings (pp. 1–16). JALTCALL SIG. https://doi.org/10.37546/JALTSIG.CALL.PCP2021-09
- Wulff, S., Römer, U., & Swales, J. M. (2012). Attended/unattended this in academic student writing: Quantitative and qualitative perspectives. Corpus Linguistics and Linguistic Theory, 8(1), 129–157. https://doi.org/10.1515/cllt-2012-0006