A Pattern-Growth Sentence Compression Technique For Malay Text Summarizer

dc.contributor.authorAlias, Suraya
dc.date.accessioned2019-01-28T08:40:00Z
dc.date.available2019-01-28T08:40:00Z
dc.date.issued2018-01
dc.description.abstractAutomatic Text Summarization (ATS) has benefited users in terms of identifying and extracting the most salient information from a given text with less effort. The application of Sentence Compression (SC) in ATS is to remove unimportant constituents from a summary sentence while preserving the salient ones by keeping the sentence’s grammar intact. Most previous SC techniques have a high dependency on syntactic rules and knowledge applied to individual word or phrase to cater the removal decision. Despite the ability to produce a new grammatical compressed sentence, prior approaches still suffer several drawbacks including the failure to include some significant and relevant sentences in constructing the final summary sentence. This study focuses on discovering human compression pattern from the developed Malay summary corpus to improve the readability and informativeness of the produced summary. A new Pattern-Growth SC (PGSC) technique inspired by the “divide and conquer” strategy tailored to the Malay language is proposed. The underlying idea is to divide the sentences into segments where unimportant segments are removed while the important ones are conquered iteratively. A new pattern-based representation with “textual constraints” discovered in this study serves as a feature to identify significant information from the text document. Meanwhile, a set of Sentence Elimination Rules with confidence value Conf discovered from human compression pattern indicates the constituents that are frequently removed. The removal decision is based on both discovered textual patterns fulfilling the proposed “removal constraints”. The experiments have shown promising results where the compressed summaries reported an F-Measure score of 0.5752 when compared to the gold standard human summaries and perform better than the baseline (uncompressed) methods.en_US
dc.identifier.urihttp://hdl.handle.net/123456789/7684
dc.publisherUniversiti Sains Malaysiaen_US
dc.subjectSentence compression techniqueen_US
dc.subjectfor malay text summarizeren_US
dc.titleA Pattern-Growth Sentence Compression Technique For Malay Text Summarizeren_US
dc.typeThesisen_US
Files
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: