MAGICDATA Mandarin Chinese Conversational Speech Corpus
Identifier: SLR123
Summary: The corpus by Magic Data Technology Co., Ltd. , containing 180 hours of rich annotated Mandarin spontaneous conversational speech data.
Category: Speech
License: Attribution-NonCommercial-NoDerivatives 4.0 International Public License (CC BY-NC-ND 4.0)
Downloads (use a mirror closer to you):
MagicData-RAMC.tar.gz [15G] ( All speech and annotations
) Mirrors:
[US]
[EU]
[CN]
About this resource:
The contents and the corresponding descriptions of the corpus include:
- The corpus contains 180 hours of speech data, which is all mobile recorded data.
- 663 speakers from different accent areas in China are invited to participate in the recording.
- All speech data are manually labeled and the transcriptions are proofed by professional inspectors to ensure the labeling quality.
- Recordings are conducted in a quiet indoor environment.
- The database is divided into training set, validation set, and testing set in a ratio of 15: 1: 2.
- Detail information such as speaker and topic information and is preserved in the metadata file.
- The topic of dialogues is diversified, ranging from science and technology to ordinary life.
@article{yang2022open, title={Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset}, author={Yang, Zehui and Chen, Yifan and Luo, Lei and Yang, Runyan and Ye, Lingxuan and Cheng, Gaofeng and Xu, Ji and Jin, Yaohui and Zhang, Qingqing and Zhang, Pengyuan and others}, journal={arXiv preprint arXiv:2203.16844}, year={2022} }About us
Magic Data Technology Co., Ltd. (referred to as Magic Data) was established in 2016. Through our higher-expertise and higher-precision data services, Magic Data has quickly grown into one of the foremost companies in artificial intelligence industry. We strive to provide the most efficient and highest quality one-stop data services for customers in the fields of speech recognition, intelligent imaging and Natural Language Understanding (NLU). Our services include data scheme design, data collection, data annotation/transcription, etc.
Contact- Tel: (+86) 10-82527250
- Email: business@magicdatatech.com
- http://www.imagicdatatech.com