Open Speech and Language Resources



CHiME-6

Identifier: SLR150

Summary: English multi-channel far field meeting data used in the CHiME-6 Challenge. It is derived from CHiME-5 by fixing some array synchronization errors.

Category: Speech

License: Attribution-ShareAlike 4.0 (CC BY-SA 4.0 International)

Downloads (use a mirror closer to you):
CHiME6_train.tar.gz [97G]   ( CHiME-6 training portion )   Mirrors: [US]   [EU]   [CN]  
CHiME6_dev.tar.gz [11G]   ( CHiME-6 development portion )   Mirrors: [US]   [EU]   [CN]  
CHiME6_eval.tar.gz [12G]   ( CHiME-6 evaluation portion )   Mirrors: [US]   [EU]   [CN]  
CHiME6_transcriptions.tar.gz [2.4M]   ( CHiME-6 JSON annotation transcriptions )   Mirrors: [US]   [EU]   [CN]  
CHiME6_floorplans.tar.gz [1.4M]   ( CHiME-6 floorplans for each session )   Mirrors: [US]   [EU]   [CN]  
LICENSE.txt [20K]   ( CHiME-5 CC BY-SA 4.0 license )   Mirrors: [US]   [EU]   [CN]  

About this resource:

CHiME-6 dataset as used in the CHiME-6 Challenge in 2020 and CHiME-7 DASR task in 2023.
It is derived from CHiME-5 by running this array synchronization script. More details are available in:

According to the dataset license, you should cite this dataset using the following BibTeX entries:


@inproceedings{barker18_interspeech,
  author={Jon Barker and Shinji Watanabe and Emmanuel Vincent and Jan Trmal},
  title={{The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1561--1565},
  doi={10.21437/Interspeech.2018-1768}
}

@inproceedings{watanabe2020chime,
  title={CHiME-6 Challenge: Tackling multispeaker speech recognition for unsegmented recordings},
  author={Watanabe, Shinji and Mandel, Michael and Barker, Jon and Vincent, Emmanuel and Arora, Ashish and Chang, Xuankai and Khudanpur, Sanjeev and Manohar, Vimal and Povey, Daniel and Raj, Desh and others},
  booktitle={CHiME 2020-6th International Workshop on Speech Processing in Everyday Environments},
  year={2020}
}