CRITICAL JOURNAL Morphology
CRITICAL JOURNAL
INTRODUCTION
In natural language,
words are often constructed from multiple morphemes, or meaning bearing units,
such as stems and suffixes. Identifying the morphemes within words is an
important task both for human learners and in natural language processing (NLP)
systems, where it can improve performance on a variety of tasks by reducing
data sparsity [Goldwater and McClosky, 2005; Larkey et al., 2002].
Unsupervised learning
of morphology is particularly interesting, both from a cognitive standpoint
(because developing unsupervised systems may shed light on how humans perform
this task) and for NLP (because morphological annotation is scarce or
nonexistent in many languages). Existing systems, such as [Goldsmith, 2001] and
[Creutz and Lagus, 2005], are relatively successful in segmenting words into
constituent morphs (essentially,substrings), e.g. reporters ) report.er.s.
However,strategies based purely on segmentation of observed forms make
systematic errors in identifying morphological relationships because many of these
relationships are obscured by spelling rules that alter the observed forms of
words.For example, most English verbs take -ing as the present continuous tense
ending (walking), but after stems ending in e, the e is deleted (taking), while
for some verbs, the final stem consonant is doubled (shutting, digging).
A purely segmenting system will be forced to
segment shutting as either shut.ting or shutt.ing. In the first case, shutting
will be correctly identified as sharing a stem with words such as shut and
shuts, butwill not share a suffix with words such as walking and running. In
the second case, the opposite will be true. In this paper, we present a
Bayesian model of morphology that identifies the latent underlying
morphological analysis of each word (shut+ing)2 along with spelling rules that
generate the observed surface forms. Most current systems for unsupervised
morphological analysis in NLP are based on various heuristic methods and
perform segmentation only [Monson et al., 2004; Freitag, 2005; Dasgupta and Ng,
2006]; [Dasgupta and Ng, 2007] also infers some spelling rules. Although these
can be effective, our goal is to investigate methods which can eventually be
built into larger joint inference systems for learning multiple aspects of
language (such as morphology, phonology, and syntax) in order to examine the
kinds of structures and biases that are needed for successful learning in such
a system. For this reason, we focus on probabilistic models rather than
heuristic procedures.
Previously, [Goldsmith,
2006] and [Goldwater and Johnson,2004] have described model-based morphology
induction systems that can account for some variations in morphs caused by
spelling rules. Both systems are based on the Minimum Description Length
principle and share certain weaknesses that we address here. In particular, due
to their complex MDL objective functions, these systems
incorporatespecial-purpose algorithms to search for the optimal morphological
analysis of the input corpus. This raises the possibility that the search
procedures themselves are influencing the results of these systems, and makes
it difficult to extend the underlying models or incorporate them into larger
systems other than through a strict 1-best pipelined approach. Indeed, each of
these systems extends the segmentation-only system of [Goldsmith, 2001] by
first using that system to identify a segmentation, and then (in a second
step), finding spelling rules to simplify the original analysis. In contrast,
the model presented here uses standard sampling methods for inference, and
provides a way to simultaneously learn both morphological analysis and spelling
rules, allowing information from each component to flow to the other during
learning.
SUMMARRY
In the introduction to
her journal Jason Naradowsky and Sharon Goldwater talks about Improving
Morphology Induction by Learning Spelling Rules. Unsupervised learning of morphology is
an important task for human learners and in natural language processing
systems. Previous systems focus on segmenting words into substrings (taking)
tak.ing), but sometimes a segmentation-only analysis is insufficient (e.g.,
taking may be more appropriately analyzed as take+ing, with a spelling rule
accounting for the deletion of the stem-finale). In this paper, we develop a
Bayesian model for simultaneously inducing both morphology and spelling rules.
We show that the addition of spelling rules improves performance over the
baseline morphology-only model.
CRITIQUE
The only aspect of this
journal which kept me from enjoying it to the fullest is that it is filled with
many confusing technical definition. Although Jason Naradowsky and Sharon
Goldwater explain some definition in certain terms.,but in the other terms
there are no specific definition. For example : there are not definition from
MDLand there are not some definition from NLP if we want to write down critical
journal or critical book for the first main idea we write down some definition
before.
CONCLUSION
This
Journal have some of a little a fault. Like there are not some explanation
definition from MDL and NLP
ELABORATION
Ø MDL is Minimum Description Length
Minimum
description length (MDL) (Rissanen 1978) is a
technique from algorithmic information theory which dictates that the best
hypothesis for a given set of data is the one that leads to the largest
compression of the data. We seek to minimize the sum of the length, in bits, of
an effective description of the model and the length, in bits, of an effective
description of the data when encoded with the help of the model.
Ø NLP
is Natural Languange Process, Natural Language process is the automatic
processing of human language.
a.
Morphology
Morphology
is the structure of word
b.
Syntax
Syntax
is the way words are used to form phrases
c.
Semantic
Semantic
devided as compositional semantic and lexical semantic. Compositional semantic
is construction of meaning based on syntax. Lexical semantic is the meaning of
individual words.
d.
Pragmatics
Pragmatics is meaning in context
0 Komentar:
Posting Komentar
Berlangganan Posting Komentar [Atom]
<< Beranda