Open Speech and Language Resources

Phone: 425 247 4129
(Daniel Povey)

Free English Corpus and Language Challenge -- Speechocean

Identifier: SLR91

Summary: A free 8.2 hours English speech recognition corpus provided by speechocean and an oriental language recognition challenge co-organized by speechocean and Tsinghua University.

Category: Speech

License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Download: [error getting size]   (Corpus )   Mirrors: [China]  

About this resource:

Dataset retracted by request of the SpeechOcean company

  • This is an 8.2 hours English speech recognition corpus, which was recorded by cell phones (iOS system or android system).
  • The corpus contains the recordings of 6393 utterances from 20 speakers in a quiet office environment.
  • Transcription files are included and the sentence transcription accuracy is higher than 98%.
  • It is totally free to use for academic purpose.
  • This corpus is a subset of a bigger corpus (1147 hours). Please contact us if you are interested.
About Oriental Language Recognition Challenge (OLR 2020)
  • This challenge co-organized by Speechocean and Tsinghua University aims at boosting language recognition technology for oriental languages. Following the success of the past four OLR challenges, the new challenge in 2020 is coming now and is more challenging and more interesting.
  • In the past year, dozens of well-known companies and universities such as Samsung, Alibaba and University of Tokyo participated in this challenge.
  • This year, 188 hours of speech recognition corpus covering 18 languages are totally free for every participant.
  • Home page of the challenge:
External URL
Contact Information


About Speechocean
Speechocean always devoted itself to providing specialized engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI. Our business involves various domains such as speech recognition, speech synthesis, computer vision, lexicon, and natural language processing and provides relevant services for the design, collection, transcription, annotation, etc. of data.