Aishell
Identifier: SLR33
Summary: Mandarin data, provided by Beijing Shell Shell Technology Co.,Ltd
Category: Speech
License: Apache License v.2.0
Downloads (use a mirror closer to you):
data_aishell.tgz [15G] ( speech data and transcripts
) Mirrors:
[US]
[EU]
[CN]
resource_aishell.tgz [1.2M] ( supplementary resources, incl. lexicon, speaker info
) Mirrors:
[US]
[EU]
[CN]
About this resource:
400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. We hope to provide moderate amount of data for new researchers in the field of speech recognition.
You can cite the data using the following BibTeX entry:
@inproceedings{aishell_2017, title={AIShell-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline}, author={Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng}, booktitle={Oriental COCOSDA 2017}, pages={Submitted}, year={2017} }
External URL: http://www.aishelltech.com/kysjcp Full description from the company website