Corpus & Repository of Online Writing (Crow)

About the Corpus

The Corpus & Repository of Writing (Crow) is an archive of undergraduate student writing designed to support research, teaching, and professional development in writing studies. The corpus and repository holds over 18,000 texts, totalling approximately 18 million words, produced by students enrolled in first-year writing courses from Fall 2009 to Summer 2022. 

In addition to texts, Crow includes detailed contextual metadata such as course, assignment type, and semester of writing. It also integrates a linked repository of instructional materials (e.g., syllabi, assignments, rubrics) that allows researchers to examine connections between pedagogies and student writing outcomes. 

The CROW corpus was developed by researchers affiliated with the University of Arizona, Purdue University, and Northern Arizona University. It is closely connected to the Multilingual Academic Corpus of Assignments – Writing and Speech (MACAWS), which was developed to extend corpus-based research to multilingual writing and speech.

Accessing the Corpus

Crow is hosted on an online platform: https://crow.corporaproject.org/

Individual researchers and institutions must request access by completing the registration form: https://api.corporaproject.org/user/register

Access to the corpus can be requested in multiple formats: 

  • Searchable repository that can be queried by keywords and lemmas and filtered by metadata fields such as assignment, institution, country, instructor, gender, and year. 
  • Downloadable instructional materials in markdown format, including assignment prompts, syllabi, rubrics, lesson plans, and related course documents.

  • Offline research corpus subset available as a downloadable ZIP file, enabling secure local analysis (additional training and verification required)

  • Crow Application Programming Interface (API) for programmatic access to corpus data when CSV export is insufficient or when retrieving multiple or structured datasets.

Analyzing the Corpus

CROW is well-suited for large-scale computational studies of student writing, especially for exploratory analysis of keywords across differing institutions, courses, assignments, and years. 

  • Keywords and Lemmas in Context (in repository): Search the online repository for keyword-in-context records and downloadable CSV files with demographic and corpus metadata. https://crow.corporaproject.org/authorize?destination=corpus 

  • Revision Differences (no coding required): In the right sidebar of individual text results, each draft from the same paper will be listed, if available. Authenticated users access can download each version and use tools such as Diffchecker or Diffmerge to highlight differences between drafts. https://crow.corporaproject.org/page/teachers 

  • Demographic Patterns in RStudio: Exported corpus data can be analyzed in RStudio to explore trends across metadata fields such as institution, assignment, course level, and student background. https://crow.corporaproject.org/page/researchers#export 

  • Model for Corpus Development: The CROW corpus provides a model for corpus design, metadata integration, and computational access. https://writecrow.org/ciabatta/ 

Selected Research

  • Gao, W., Picoral, A., & Staples, R. (2021). Citation practices of L2 writers in first-year writing courses: Form, rhetorical function, and connection with pedagogical materials. Applied Corpus Linguistics, 1(1). https://doi.org/10.1016/j.acorp.2021.100005 

  • Kwon, H., Partridge, R. S., & Staples, S. (2018). Building a local learner corpus: Construction of a first- year ESL writing corpus for research, teaching, mentoring, and collaboration. International Journal of Learner Corpus Research, 4(1), 112-127. https://benjamins.com/catalog/ijlcr.16017.kwo 

  • Kwon, H., Staples, S. & Partridge, R. S. (2018). Source work in the first-year L2 writing classroom: Undergraduate L2 writers’ use of reporting verbs. Journal of English for Academic Purposes, 34, 86-96. https://doi.org/10.1016/j.jeap.2018.04.001 

  • Lan, G., & Sun, Y. (2019). A corpus-based investigation of noun phrase complexity in the L2 writings of a first-year composition course. Journal of English for Academic Purposes, 38, 14–24. https://doi.org/10.1016/j.jeap.2018.12.001 

  • Picoral, A., Staples, S., & Reppen, R. (2021). Automated annotation of learner English. International Journal of Learner Corpus Research, 7(1), 17–52. https://10.1075/ijlcr.20003.pic 

  • Shin, J. (2021). The use of stance in L2 first-year college writing: Its relation to genre, revision, and writer characteristics. In M. Charles & A. Frankenberg-Garcia (Eds.), Corpora in ESP/EAP writing instruction (pp. 123-146). Routledge. https://www.taylorfrancis.com/chapters/edit/10.4324/9781003001966-6-10/use-stance-l2-first-year-college-writing-ji-young-shin 

  • Shin, J., Velázquez, A. J, Swatek, A., Staples, S., & Partridge, R. S. (2018). Examining the effectiveness of corpus-informed instruction of reporting verbs in L2 first-year college writing. L2 Journal, 10(3), 31–46. https://doi.org/10.5070/L210337022