A Video Corpus of Spanish Spoken in Texas

The Spanish in Texas Corpus currently consists of over 500,000 words from 97 bilingual speakers living in Texas. Video files, audio files, full transcripts, and POS annotations are available for download. Researchers and educators will be given free access to the corpus. Access requires agreeing to abide by the site’s Code of Ethics and registering for an account.

Other Resources

The following links are provided for researchers interested in building on or replicating our work in other contexts. The sample data, code, and documentation below pertain to the development of both the Spanish in Texas Corpus and the SpinTX Video Archive.

Code Repositories

  • Scripts for basic linguistic tagging of the corpus and other corpus processing functions are available at the SpinTXCorpusProcessing repository on GitHub.
  • Scripts for pedagogical annotation of the corpus are available at the SpinTXPedagogicalAnnotation repository on GitHub.

Connect with Us

You can contact the Spanish in Texas project team using by email at 

Profiling Spanish as it is spoken throughout Texas today.