TEDx Spanish Corpus
Identifier: SLR67
Summary: Spanish data taken from the TEDx Talks
Category: Speech
License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Downloads (use a mirror closer to you):
tedx_spanish_corpus.tgz [2.3G] (Spanish speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
About this resource:
It contains spontaneous speech of several expositors in TEDx events; most of them are men.
Transcriptions are presented in lowercase with no punctuation marks.
The data collection process was partly developed by the social service program "Desarrollo de Tecnologías del Habla" that depends on the National Autonomous University of Mexico and partly by the CIEMPIESS-UNAM project (http://www.ciempiess.org/)
Special thanks to the TED-Talks team for allowing us to share this dataset.
You can cite the data using the following BibTeX entry:
@misc{mena_2019, title = "{TEDx Spanish Corpus. Audio and transcripts in Spanish taken from the TEDx Talks; shared under the CC BY-NC-ND 4.0 license}", author = "Hernandez-Mena, Carlos D.", howpublished = "Web Download", institution = "Universidad Nacional Autonoma de Mexico", location = "Mexico City", year = "2019" }