Bootstrapping Kelantan and Sarawak Malay dialect models on text and phonetic analyses in text-to-speech system
Loading...
Date
2017-07
Authors
Khaw, Jasmina Yen Min
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Sains Malaysia
Abstract
Text-to-speech (TTS) technologies have matured and they have been equipped and
embedded in many tools, for instance in mobile phones, robotics and telephony system.
For building a TTS system, resources that required are speech corpus, pronunciation
dictionary and text. Nevertheless, building an under-resourced Malay dialect TTS system
is challenging. The problems of limited language resources in Malay dialects include
inexistence of Malay dialects pronunciation dictionaries, quasi-unknown of the sets of
Malay dialects phonemes, no standard orthography in Malay dialects and limited written
text of Malay dialects. TTS system involves text analysis, phonetic analysis, prosodic
analysis and speech synthesis. Our study focuses on text analysis and phonetic analysis of
under-resourced Malay dialect. In this thesis, a framework of under-resourced Malay
dialect TTS system has been proposed. Besides, we also propose approaches to bootstrap
Malay dialect models using Standard Malay and multilingual resources for developing
Malay dialects TTS systems. In our proposed framework, a semi-supervised approach to
translate Standard Malay text corpus to Malay dialects text corpus has been proposed to
solve the problem of limited written text of Malay dialects. In this problem, we propose a
word and phrase alignment to obtain vocabularies and translation rules of Malay dialects
for translation task. The results show that the precision and recall values are above 95%.
Next, a new normalisation algorithm has been proposed for standardising the orthography
of Malay dialects text. For building a TTS system, it requires a standardised written text.
Description
Keywords
Bootstrapping Kelantan and Sarawak Malay dialect , phonetic analyses in text-to-speech system