Open Speech and Language Resources



AliMeeting

Identifier: SLR119

Summary: A Free Mandarin Multi-channel Meeting Speech Corpus, provided by Alibaba Group

Category: Speech

License: CC BY-SA 4.0

About this resource:

The AliMeeting Mandarin corpus, originally designed for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT), is recorded from real meetings, including far-field speech collected by an 8-channel microphone array as well as near-field speech collected by each participants' headset microphone. The dataset contains 118.75 hours of speech data in total, divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test), according to M2MeT challenge arrangement. Specifically, the Train, Eval and Test sets contain 212, 8 and 20 meeting sessions respectively, and each session consists of a 15 to 30-minute discussion by 2-4 participants. AliMeeting covers a variety of aspects in real-world meetings, including diverse meeting rooms, various number of meeting participants and different speaker overlap ratios. High-quality transcriptions are provided as well. The dataset can be used for tasks in meeting rich transcriptions, including speaker diarization and multi-speaker automatic speech recognition.

Associated with the dataset, the details of ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) can be found here: Challenge website link

You can cite the data using the following BibTeX entry:

@inproceedings{Yu2022M2MeT,
  title={M2{M}e{T}: The {ICASSP} 2022 Multi-Channel Multi-Party Meeting Transcription Challenge},
  author={Yu, Fan and Zhang, Shiliang and Fu, Yihui and Xie, Lei and Zheng, Siqi and Du, Zhihao and Huang, Weilong and Guo, Pengcheng and Yan, Zhijie and Ma, Bin and Xu, Xin and Bu, Hui},
  booktitle={Proc. ICASSP},
  year={2022},
  organization={IEEE}
}

@inproceedings{Yu2022Summary,
  title={Summary On The {ICASSP} 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge},
  author={Yu, Fan and Zhang, Shiliang and Guo, Pengcheng and Fu, Yihui and Du, Zhihao and Zheng, Siqi and Huang, Weilong and Xie, Lei  and Tan, Zheng-Hua and Wang, DeLiang and Qian, Yanmin and Lee, Kong Aik and Yan, Zhijie and Ma, Bin and Xu, Xin and Bu, Hui},
  booktitle={Proc. ICASSP},
  year={2022},
  organization={IEEE}
}

Challenge introduction paper: M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Challenge summary paper: Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

External URLs:
https://speech-lab-share-data.oss-cn-shanghai.aliyuncs.com/AliMeeting/openlr/Train_Ali_far.tar.gz   ([73.24G] (AliMeeting Train set, 8-channel microphone array speech) )
https://speech-lab-share-data.oss-cn-shanghai.aliyuncs.com/AliMeeting/openlr/Train_Ali_near.tar.gz   ([22.85G] (AliMeeting Train set, headset microphone speech) )
https://speech-lab-share-data.oss-cn-shanghai.aliyuncs.com/AliMeeting/openlr/Eval_Ali.tar.gz   ([3.42G] (AliMeeting Eval set, 8-channel microphone array speech, headset microphone speech) )
https://speech-lab-share-data.oss-cn-shanghai.aliyuncs.com/AliMeeting/openlr/Test_Ali.tar.gz   ([8.90G] (AliMeeting Test set, 8-channel microphone array speech, headset microphone speech) )