the Spoken Corpus Klient

HelpAbout SpockFrequency distribution list

Search term: Advanced Search

About Spock

Spock is a PHP-based web-interface for spoken corpora with aligned transcriptions. Spock allows an easy access to the spoken data, via queries on the orthographic transcription. The basic functionallity of Spock is the following: when searching for a given word, the system provides all fragments in the corpus that contain that word, with the search-word highlighted. Next to the orthography the web-interface provides the actual sound fragment corresponding to the orthographic transcription fragment, allowing the user to listen to the actual sound data.

On top of this basic functionallity, several advanced search options are provided. Firstly, the orthographic query can be more than just a word - it can also be a part of a word, a sequence, a pattern, etc. Secondly, the results can be filtered, mainly on speaker characteristics, dislaying only results of speaker above a certain age, of a specific gender, or from a given region. It is also possible to limit the results to 1 or 2 results per speaker to get a range of speakers producing the same word.

Rather than storing all the individual sound fragments, Spock generates the sound fragment on-the-fly, extracting the fragment of the sound file corresponding to a given transcription fragment. This allows several extensions to the system - firstly, it is possible to extend to sound fragment by several seconds if the results are too strictly cut off. And for each fragment, it is possible to see the sentence context, displaying the sentences occurring before and after the given sentence, together with the sound fragment for the larger context.

Spock relies on a proprietary tab-based format for the orthographic transcription file, defining on each line the sound-file, begin and end time, speaker, and transcription. The data for the speakers and the orthographic files are stored in separate files. Spock comes with scripts to in- and export the spock fileformat to either Elan .eaf files or PRAAT .textGrid files.

Spock should run on any Unix-based system running Apache with PHP - the only tool that needs to be installed is the SoX sound extraction program, which is freely available at sox.sourceforge.net/ and included in most Port systems under LINUX. For more information about Spock, contact its author: maarten@iltec.pt.