A Pattern-Growth Sentence Compression Technique For Malay Text Summarizer
Loading...
Date
2018-01
Authors
Alias, Suraya
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Sains Malaysia
Abstract
Automatic Text Summarization (ATS) has benefited users in terms of identifying and extracting the most salient information from a given text with less effort. The application of Sentence Compression (SC) in ATS is to remove unimportant constituents from a summary sentence while preserving the salient ones by keeping the sentence’s grammar intact. Most previous SC techniques have a high dependency on syntactic rules and knowledge applied to individual word or phrase to cater the removal decision. Despite the ability to produce a new grammatical compressed sentence, prior approaches still suffer several drawbacks including the failure to include some significant and relevant sentences in constructing the final summary sentence. This study focuses on discovering human compression pattern from the developed Malay summary corpus to improve the readability and informativeness of the produced summary. A new Pattern-Growth SC (PGSC) technique inspired by the “divide and conquer” strategy tailored to the Malay language is proposed. The underlying idea is to divide the sentences into segments where unimportant segments are removed while the important ones are conquered iteratively. A new pattern-based representation with “textual constraints” discovered in this study serves as a feature to identify significant information from the text document. Meanwhile, a set of Sentence Elimination Rules with confidence value Conf discovered from human compression pattern indicates the constituents that are frequently removed. The removal decision is based on both discovered textual patterns
fulfilling the proposed “removal constraints”. The experiments have shown promising results where the compressed summaries reported an F-Measure score of 0.5752 when compared to the gold standard human summaries and perform better than the baseline (uncompressed) methods.
Description
Keywords
Sentence compression technique , for malay text summarizer