Content of the database concerning to each sentence:

  1. text file (.txt),
  2. phonetic transcription file(.gph),
  3. audio file (.wav),
  4. voiced/unvoiced markers file (.pit),
  5. sound boundaries file(.TextGrid),
  6. word boundaries file (.ssw),
  7. Formant data file for vowels in the sentence for F1, F2, F3 formants in 5 measuring points inside the vowel (10%, 25%, 50%, 75%, 90%) (txt),
  8. Accent markers for each word of the sentence in the text form of the sentence (only in declarative sentences).

The above listed data are in strickt parallel connection with each other. So cross examinations can be performed as well.

How to get it: the database is part of the international database collection (CESAR project). For any interest please contact: dr. Géza Németh; nemeth@tmit.bme.hu

Description
The database is organised by directories and subdirectories. Ever speaker has the same directory format. The organisation is as follows: one speaker has 56 main directories, in each 36 sentences (all type of data files). The file names for a sentence are the same, only the file types are different (.txt, .wav etc.). The whole database consistc of 12 such main directories.