<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim">
 <record>
  <leader>00000ctm a22000003a 4500</leader>
  <controlfield tag="001">UP-99796217609277554</controlfield>
  <controlfield tag="003">Buklod</controlfield>
  <controlfield tag="005">20100507085415.0</controlfield>
  <controlfield tag="006">m    |o  d |      </controlfield>
  <controlfield tag="007">ta</controlfield>
  <controlfield tag="008">100507s        xx     d     r    |||| u|</controlfield>
  <datafield tag="035" ind1=" " ind2=" ">
   <subfield code="a">(iLib)UPD-00131001916</subfield>
  </datafield>
  <datafield tag="040" ind1=" " ind2=" ">
   <subfield code="a">DENGII</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
   <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="090" ind1=" " ind2="0">
   <subfield code="a">LG 993.5 2010 E64</subfield>
   <subfield code="b">A43</subfield>
  </datafield>
  <datafield tag="100" ind1="1" ind2=" ">
   <subfield code="a">Albete, Jonald D.</subfield>
  </datafield>
  <datafield tag="245" ind1="0" ind2="0">
   <subfield code="a">Enhancements in prosody and smoothness of synthesized Filipino speech</subfield>
   <subfield code="c">Jonald D. Albete, Narz Marbeth V. David.</subfield>
  </datafield>
  <datafield tag="264" ind1=" " ind2="1">
   <subfield code="a">2010</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
   <subfield code="a">vii, 58 leaves</subfield>
   <subfield code="b">ill. (some col.) +</subfield>
   <subfield code="e">1 computer laser optical disc (4 3/4 in.)</subfield>
  </datafield>
  <datafield tag="502" ind1=" " ind2=" ">
   <subfield code="a">Thesis (B.S. Computer Engineering &amp; B.S. Electronics and Communications Engineering) -- University of the Philippines, Diliman</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
   <subfield code="a">Text-to-speech systems automatically generate speech from a text input. One of its significant uses is to aid visually-impaired individuals in reading with the use of digital audio books. It also improves human-machine interface by allowing the computer to read text on screen. The quality of the system depends on the intelligibility and naturalness of the synthesized speech.  The latest project on a concatenative Filipino Text-to-Speech (TTS) system involved the incorporation of duration and intonation models and the use of Harmonic plus Noise Model (HNM) to concatenate diphones. The implementation was successful in enhancing the performance of the Filipino TTS; however, listening tests show that considerable to moderate effort was needed to understand the synthesized speech.  The goal of the researchers is to enhance the existing Filipino TTS system to produce more natural and more intelligible synthesized speech. The new Filipino TTS system includes a prosody generator, a unit selection block, and a synthesizer. For the development and evaluation, the Filipino Speech Corpus was used, which contains recordings of nonsense isolated words, conversational phrases, and readings from news clippings and novels. Nonsense isolated words are words with no meaning but are used in the creation of speech synthesis database to avoid pronunciation biases among speakers. For this project, nonsense isolated words are formed to ensure that all possible theoretical diphones are represented in the database. Eighty percent of the corpus was used as training data to create each of the three prosody models--intensity, duration, and pitch; the remaining 20% was used for testing. A database of speech segments for unit selection was extracted from the training data. The unit selection block was implemented to select the sequence of speech units that is best for concatenation. To further improve the naturalness of the synthesized speech, waveform interpolation and optimal coupling were also implemented to smoothen out the spectral mismatches at the concatenation points.To select the best synthesizer for Filipino Speech Synthesis, a comparative study of an HNM synthesizer and a Time Domain Pitch Synchronous Overlap-Add (TD-PSOLA) was performed using diphone units. The performance of the Filipino TTS system was evaluated on a Mean Opinion Score scale of 1 to 5 based on listening effort and pleasantness across 30 listeners. Overall, the results show that TD-PSOLA is better than HNM with an MOS score of 3.13 for listening effort and 2.43 for voice pleasantness.</subfield>
  </datafield>
  <datafield tag="650" ind1=" " ind2="0">
   <subfield code="a">Speech synthesis.</subfield>
  </datafield>
  <datafield tag="650" ind1=" " ind2="0">
   <subfield code="a">Smoothing (Numerical analysis)</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
   <subfield code="a">Text-to-speech systems.</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
   <subfield code="a">Pitch Synchronous Overlap Add Method (PSOLA).</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">David, Narz Marbeth V.</subfield>
  </datafield>
  <datafield tag="905" ind1=" " ind2=" ">
   <subfield code="a">FI</subfield>
  </datafield>
  <datafield tag="905" ind1=" " ind2=" ">
   <subfield code="a">UP</subfield>
  </datafield>
  <datafield tag="852" ind1="0" ind2=" ">
   <subfield code="a">UPD</subfield>
   <subfield code="b">DENG-II</subfield>
   <subfield code="h">LG 993.5 2010 E64</subfield>
   <subfield code="i">A43</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
   <subfield code="a">Thesis</subfield>
  </datafield>
 </record>
</collection>
