00000ctm a22000003a 4500 UP-99796217609277554 Buklod 20100507085415.0 m |o d | ta 100507s xx d r |||| u| (iLib)UPD-00131001916 DENGII eng LG 993.5 2010 E64 A43 Albete, Jonald D. Enhancements in prosody and smoothness of synthesized Filipino speech Jonald D. Albete, Narz Marbeth V. David. 2010 vii, 58 leaves ill. (some col.) + 1 computer laser optical disc (4 3/4 in.) Thesis (B.S. Computer Engineering & B.S. Electronics and Communications Engineering) -- University of the Philippines, Diliman Text-to-speech systems automatically generate speech from a text input. One of its significant uses is to aid visually-impaired individuals in reading with the use of digital audio books. It also improves human-machine interface by allowing the computer to read text on screen. The quality of the system depends on the intelligibility and naturalness of the synthesized speech. The latest project on a concatenative Filipino Text-to-Speech (TTS) system involved the incorporation of duration and intonation models and the use of Harmonic plus Noise Model (HNM) to concatenate diphones. The implementation was successful in enhancing the performance of the Filipino TTS; however, listening tests show that considerable to moderate effort was needed to understand the synthesized speech. The goal of the researchers is to enhance the existing Filipino TTS system to produce more natural and more intelligible synthesized speech. The new Filipino TTS system includes a prosody generator, a unit selection block, and a synthesizer. For the development and evaluation, the Filipino Speech Corpus was used, which contains recordings of nonsense isolated words, conversational phrases, and readings from news clippings and novels. Nonsense isolated words are words with no meaning but are used in the creation of speech synthesis database to avoid pronunciation biases among speakers. For this project, nonsense isolated words are formed to ensure that all possible theoretical diphones are represented in the database. Eighty percent of the corpus was used as training data to create each of the three prosody models--intensity, duration, and pitch; the remaining 20% was used for testing. A database of speech segments for unit selection was extracted from the training data. The unit selection block was implemented to select the sequence of speech units that is best for concatenation. To further improve the naturalness of the synthesized speech, waveform interpolation and optimal coupling were also implemented to smoothen out the spectral mismatches at the concatenation points.To select the best synthesizer for Filipino Speech Synthesis, a comparative study of an HNM synthesizer and a Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA) was performed using diphone units. The performance of the Filipino TTS system was evaluated on a Mean Opinion Score scale of 1 to 5 based on listening effort and pleasantness across 30 listeners. Overall, the results show that TD-PSOLA is better than HNM with an MOS score of 3.13 for listening effort and 2.43 for voice pleasantness. Speech synthesis. Smoothing (Numerical analysis) Text-to-speech systems. Pitch Synchronous Overlap Add Method (PSOLA). David, Narz Marbeth V. FI UP UPD DENG-II LG 993.5 2010 E64 A43 Thesis