Bootstrapping Kelantan and Sarawak Malay dialect models on text and phonetic analyses in text-to-speech system

Loading...
Thumbnail Image
Date
2017-07
Authors
Khaw, Jasmina Yen Min
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Sains Malaysia
Abstract
Text-to-speech (TTS) technologies have matured and they have been equipped and embedded in many tools, for instance in mobile phones, robotics and telephony system. For building a TTS system, resources that required are speech corpus, pronunciation dictionary and text. Nevertheless, building an under-resourced Malay dialect TTS system is challenging. The problems of limited language resources in Malay dialects include inexistence of Malay dialects pronunciation dictionaries, quasi-unknown of the sets of Malay dialects phonemes, no standard orthography in Malay dialects and limited written text of Malay dialects. TTS system involves text analysis, phonetic analysis, prosodic analysis and speech synthesis. Our study focuses on text analysis and phonetic analysis of under-resourced Malay dialect. In this thesis, a framework of under-resourced Malay dialect TTS system has been proposed. Besides, we also propose approaches to bootstrap Malay dialect models using Standard Malay and multilingual resources for developing Malay dialects TTS systems. In our proposed framework, a semi-supervised approach to translate Standard Malay text corpus to Malay dialects text corpus has been proposed to solve the problem of limited written text of Malay dialects. In this problem, we propose a word and phrase alignment to obtain vocabularies and translation rules of Malay dialects for translation task. The results show that the precision and recall values are above 95%. Next, a new normalisation algorithm has been proposed for standardising the orthography of Malay dialects text. For building a TTS system, it requires a standardised written text.
Description
Keywords
Bootstrapping Kelantan and Sarawak Malay dialect , phonetic analyses in text-to-speech system
Citation