Silbo Gomero Speech Corpus
Summary: Corpus of the Silbo Gomero whistled language, based on 49 minutes of recordings created by 4 whistlers.
License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Downloads (use a mirror closer to you):
README.txt [2.7K] ( Readme file ) Mirrors: [US] [EU] [CN]
words.zip [210M] ( Single-word clips with transcripts ) Mirrors: [US] [EU] [CN]
fragments.zip [232M] ( Short fragments with transcripts ) Mirrors: [US] [EU] [CN]
sentences.zip [86M] ( Whole sentences with transcripts ) Mirrors: [US] [EU] [CN]
About this resource:
The corpus consists of 3 parts, each of which was made from the same data, edited in different ways; separate transcription file is provided for each part.
- 'words.zip' contains clips of single, separate words. Some clips may contain more than one word, in cases where the separation was not possible.
- 'sentences.zip' contains clips of entire sentences. Some parts of the recordings are not represented here; for example, one recording contained a poem, which could not be separated into sentences.
- 'fragments.zip' contains clips of short fragments of speech (on average, about 6.5 words long); those fragments were made by separating recordings where longer pauses between words occured.
This corpus was created by Agata Jakubiak, a student at University of Warsaw, from data provided by Francisco Javier Correa, working for the Silbo Gomero Teaching Project (Proyecto de Enseñanza de Silbo Gomero), as a part of research into Automatic Speech Recognition of whistled speech.