Open Speech and Language Resources



Kashmiri Data Corpus

Identifier: SLR122

Summary: An audio and text corpus for the Kashmiri language

Category: Speech

License: GPL-3.0-or-later

Downloads (use a mirror closer to you):
kashmiri.tar.gz [394M]   ( Kashmiri speech and transcripts )   Mirrors: [US]   [EU]   [CN]  

About this resource:

This is a collection of transcribed Kashmiri recordings taken from native speakers.

The data collection and transcription was done by a group of students from Kashmir, India who were working on a project for the development of an ASR system for the Kashmiri language.

Scripts for the post-processing of this dataset can be found at https://github.com/erstan/kscp