Yoloxóchitl-Mixtec
Identifier: SLR89
Summary: Yolóxochitl Mixtec Speech with Transcription
Category: Speech
License: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Downloads (use a mirror closer to you):
Yoloxochitl-Mixtec-Data.tgz [86G] (Yolóxochitl Mixtec Speech and Transcription
) Mirrors:
[US]
[EU]
[CN]
Yoloxochitl-Mixtec-Manifest.tgz [51K] (Train-Dev-Test Split and Channel Information for Multi-channel Wave
) Mirrors:
[US]
[EU]
[CN]
Novice-Transcription-Correction.zip [1.3G] (Data for Novice Transcription Correction (Mixtec speech data with novice transcription and expert correction)
) Mirrors:
[US]
[EU]
[CN]
About this resource:
Production of the corpus was supported by the National Science Foundation, Documentation Endangered Languages program and the Endangered Language Documentation Programme (ELDP) at the School of Oriental and African Studes :
NSF Award 0966462, “Corpus and lexicon development: Endangered genres of discourse and domains of cultural knowledge in Tu’un ísaví (Mixtec) of Yoloxóchitl, Guerrero”; NSF Award 1500738, “Collaborative Research: Speech technology-enhanced annotation and training tool for Yoloxóchitl Mixtec (xty)”; NSF Award 1761421, “A corpus-based, comparative, and multi-media lexicosemantic resource for Yoloxóchitl Mixtec (xty)”.
ELDP Pilot project PPG0048, “Corpus and lexicon development: Endangered genres of discourse in Tu’un ísaví (Mixtec) of Yoloxóchitl, Guerrero”; ELDP Major Documentation Project MDP0201, “Corpus and lexicon development: Endangered genres of discourse and domains of cultural knowledge in Tu’un ísaví (Mixtec) of Yoloxóchitl, Guerrero”.
All material is made available under the Creative Common license CC BY-SA (Attribution-ShareAlike). Please cite or use any material as follows (Corresponding author is jonamith@gmail.com).
Amith, Jonathan D., and Rey Castillo Castillo. n.d. Audio corpus of Yoloxóchitl Mixtec with accompanying time-coded transcriptons in ELAN.
For ASR corpus and corresponding baseline results, please cite (Corresponding author is jonamith@gmail.com and jiatongs@andrew.cmu.edu)
@inproceedings{shi2021leveraging, title={Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yol{\'o}xochitl Mixtec}, author={Shi, Jiatong and Amith, Jonathan D and Garc{\'\i}a, Rey Castillo and Sierra, Esteban Guadalupe and Duh, Kevin and Watanabe, Shinji}, booktitle={Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume}, pages={1134--1145}, year={2021} }