Natural language analysis in machine translation (mt) based on the string-tree correspondence grammar (stcg)
Loading...
Date
1994
Authors
Enya Kong, Tang
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The String-Tree Correspond'e nce Grammar (STCG) [Zaharin 87a] is a grammar
formalism for defining:
• a set of strings (a language),
• a set of trees (valid representation/interpretation structures),
• the mapping between the two (to be interpreted for analysis & generation).
The formalism is argued to be a totally declarative grammar formalism that can associate,
to strings in a language, arbitrary tree structures as desired by the grammar writer to be
the linguistic representation structures of the strings. More importantly is the facility to
specify the correspondence between the string and the associated tree in a very natural
manner. These features are very much desired in grammar writing, in particular for the
treatment of certain linguistic phenomena which are 'non-standard', namely featurisation,
lexicalisation and crossed dependencies. Furthermore, a grammar written in this way
naturally inherits the desired property of bidirectionality (in fact non-directionality) such
that the same grammar can be interpreted for both analysis and generation.
In this thesis, we investigate the properties of the STCG for interpretation towards
analysis (as is understood within the context of Machine Translation (MT)). Other than
using STCG grammars as specifications for the automatic generation of analysis
programs in the Specialised Languages for Linguistic Programming (SLLPs) of MT
systems (a study reported in the Appendix), the work centres around the specification of
a general analyser/parser for the STCG. The proposed STCG analyser is capable of
mimicking some very useful features in various context-free parsing techniques. One
such feature is the use of charts in tabular parsing algorithms, as el'.emplified in Earley's
Algorithm [Earley 70], which is very helpful in avoiding redundancies that may
otherwise result in a combinatorial explosion. Another is the compact way of representing
possible parse trees for ambiguous sentences, such as the one seen in the GLR parser
[Tomita 87]. We shall also provide a natural way for handling the kind of awkward
phenomena mentioned above (namely lexicalisation, featurisation, and worst of all,
crossed dependencies) while at the same time retaining much of the efficiency of standard
context-free parsing algorithms.
The thesis also discusses the treatment of attributes/features in the STCG, which to date
has been very lacking in the published literature. In general, linguistic rules written in the
STCG describe strings of terms with all the relevant information as one would expect
from the result of a morphological analysis and reference to some lexical dictionary, and
the associated representation structures are typically the m-structures [Vauquois 78]
[Zaharin 87b] which support many levels of interpretation (morpho-syntagmatic,
functional, logical, semantic features & relations, etc.). Such a large quantity of
information would indeed require a very convenient form of expression and the
corresponding means of manipulation.
Description
Keywords
Machine translation (mt) , String-tree