The University of South Carolina First-Year English (FYE) Corpus is a large collection of first-year student writing curated at the University of South Carolina between 2014 and 2017 by Duncan Buell and Chris Holcomb. The corpus consists of 17,246 first-year undergraduate papers, including 8,575 matched pairs of first and final drafts as well as 96 final-only drafts, totaling approximately 22 million words.
All texts are argumentative essays written for First-Year English courses, and none of the contributions come from Honors College students, making the corpus especially representative of mainstream FYW instruction. The corpus focuses exclusively on the discipline of English and is licensed for non-commercial academic research, with files provided in plain ASCII text format.
The corpus is available as a collection of ZIP files organized by year and semester:
This corpus is especially well-suited for large-scale computational analysis of student writing, with particular strengths in the study of drafting, revision, and writing development. A range of accessible analytical tools can be used with it, depending on technical comfort level.
Holcomb, C., & Buell, D. A. (2021). A corpus of first-year composition: Exploring stylistic complexity in student writing. In Amanda Licastro & Benjamin Miller (Eds.), Composition and Big Data, (pp. 35–51). University of Pittsburgh Press.