A Video Corpus of Spanish Spoken in Texas
The Spanish in Texas Corpus currently consists of over 500,000 words from 97 bilingual speakers living in Texas. Video files, audio files, full transcripts, and POS annotations are available for download. Researchers and educators will be given free access to the corpus. Access requires agreeing to abide by the site’s Code of Ethics and registering for an account.
The following links are provided for researchers interested in building on or replicating our work in other contexts. The sample data, code, and documentation below pertain to the development of both the Spanish in Texas Corpus and the SpinTX Video Archive.
- Scripts for basic linguistic tagging of the corpus and other corpus processing functions are available at the SpinTXCorpusProcessing repository on GitHub.
- Scripts for pedagogical annotation of the corpus are available at the SpinTXPedagogicalAnnotation repository on GitHub.
- The Corpus to Classroom Blog documents the development of the SpinTX Video Archive.
- Presentation slides are shared on our SlideShare page.