This is a corpus of the Silbo Gomero whistled language, which is a whistled form of Spanish used on the La Gomera island. It was created from 49 minutes of raw recordings. The recordings contained read speech, and were produced by 4 fluent whistlers. They were created for use in teaching this language to children native to the island.
The corpus consists of 3 parts, each of which was made from the same data, edited in different ways; separate transcription file is provided for each part.
This corpus was created by Agata Jakubiak, a student at University of Warsaw, from data provided by Francisco Javier Correa, working for the Silbo Gomero Teaching Project (Proyecto de EnseƱanza de Silbo Gomero), as a part of research into Automatic Speech Recognition of whistled speech.
You can cite the data using the following BibTeX entry:
@inproceedings{jakubiak23_interspeech,
author={Agata Jakubiak},
title={{Whistle-to-text: Automatic recognition of the Silbo Gomero whistled language}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={3402--3406},
doi={10.21437/Interspeech.2023-989}
}