Bootstrapping Kelantan and Sarawak Malay dialect models on text and phonetic analyses in text-to-speech system

dc.contributor.authorKhaw, Jasmina Yen Min
dc.date.accessioned2019-09-04T07:01:45Z
dc.date.available2019-09-04T07:01:45Z
dc.date.issued2017-07
dc.description.abstractText-to-speech (TTS) technologies have matured and they have been equipped and embedded in many tools, for instance in mobile phones, robotics and telephony system. For building a TTS system, resources that required are speech corpus, pronunciation dictionary and text. Nevertheless, building an under-resourced Malay dialect TTS system is challenging. The problems of limited language resources in Malay dialects include inexistence of Malay dialects pronunciation dictionaries, quasi-unknown of the sets of Malay dialects phonemes, no standard orthography in Malay dialects and limited written text of Malay dialects. TTS system involves text analysis, phonetic analysis, prosodic analysis and speech synthesis. Our study focuses on text analysis and phonetic analysis of under-resourced Malay dialect. In this thesis, a framework of under-resourced Malay dialect TTS system has been proposed. Besides, we also propose approaches to bootstrap Malay dialect models using Standard Malay and multilingual resources for developing Malay dialects TTS systems. In our proposed framework, a semi-supervised approach to translate Standard Malay text corpus to Malay dialects text corpus has been proposed to solve the problem of limited written text of Malay dialects. In this problem, we propose a word and phrase alignment to obtain vocabularies and translation rules of Malay dialects for translation task. The results show that the precision and recall values are above 95%. Next, a new normalisation algorithm has been proposed for standardising the orthography of Malay dialects text. For building a TTS system, it requires a standardised written text.en_US
dc.identifier.urihttp://hdl.handle.net/123456789/8799
dc.language.isoenen_US
dc.publisherUniversiti Sains Malaysiaen_US
dc.subjectBootstrapping Kelantan and Sarawak Malay dialecten_US
dc.subjectphonetic analyses in text-to-speech systemen_US
dc.titleBootstrapping Kelantan and Sarawak Malay dialect models on text and phonetic analyses in text-to-speech systemen_US
dc.typeThesisen_US
Files
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: