Sabtu, 15 Juli 2017

CRITICAL JOURNAL Morphology





CRITICAL JOURNAL

INTRODUCTION
In natural language, words are often constructed from multiple morphemes, or meaning bearing units, such as stems and suffixes. Identifying the morphemes within words is an important task both for human learners and in natural language processing (NLP) systems, where it can improve performance on a variety of tasks by reducing data sparsity [Goldwater and McClosky, 2005; Larkey et al., 2002].

Unsupervised learning of morphology is particularly interesting, both from a cognitive standpoint (because developing unsupervised systems may shed light on how humans perform this task) and for NLP (because morphological annotation is scarce or nonexistent in many languages). Existing systems, such as [Goldsmith, 2001] and [Creutz and Lagus, 2005], are relatively successful in segmenting words into constituent morphs (essentially,substrings), e.g. reporters ) report.er.s. However,strategies based purely on segmentation of observed forms make systematic errors in identifying morphological relationships because many of these relationships are obscured by spelling rules that alter the observed forms of words.For example, most English verbs take -ing as the present continuous tense ending (walking), but after stems ending in e, the e is deleted (taking), while for some verbs, the final stem consonant is doubled (shutting, digging).

 A purely segmenting system will be forced to segment shutting as either shut.ting or shutt.ing. In the first case, shutting will be correctly identified as sharing a stem with words such as shut and shuts, butwill not share a suffix with words such as walking and running. In the second case, the opposite will be true. In this paper, we present a Bayesian model of morphology that identifies the latent underlying morphological analysis of each word (shut+ing)2 along with spelling rules that generate the observed surface forms. Most current systems for unsupervised morphological analysis in NLP are based on various heuristic methods and perform segmentation only [Monson et al., 2004; Freitag, 2005; Dasgupta and Ng, 2006]; [Dasgupta and Ng, 2007] also infers some spelling rules. Although these can be effective, our goal is to investigate methods which can eventually be built into larger joint inference systems for learning multiple aspects of language (such as morphology, phonology, and syntax) in order to examine the kinds of structures and biases that are needed for successful learning in such a system. For this reason, we focus on probabilistic models rather than heuristic procedures.

Previously, [Goldsmith, 2006] and [Goldwater and Johnson,2004] have described model-based morphology induction systems that can account for some variations in morphs caused by spelling rules. Both systems are based on the Minimum Description Length principle and share certain weaknesses that we address here. In particular, due to their complex MDL objective functions, these systems incorporatespecial-purpose algorithms to search for the optimal morphological analysis of the input corpus. This raises the possibility that the search procedures themselves are influencing the results of these systems, and makes it difficult to extend the underlying models or incorporate them into larger systems other than through a strict 1-best pipelined approach. Indeed, each of these systems extends the segmentation-only system of [Goldsmith, 2001] by first using that system to identify a segmentation, and then (in a second step), finding spelling rules to simplify the original analysis. In contrast, the model presented here uses standard sampling methods for inference, and provides a way to simultaneously learn both morphological analysis and spelling rules, allowing information from each component to flow to the other during learning.
SUMMARRY

In the introduction to her journal Jason Naradowsky and Sharon Goldwater talks about Improving Morphology Induction by Learning Spelling Rules. Unsupervised learning of morphology is an important task for human learners and in natural language processing systems. Previous systems focus on segmenting words into substrings (taking) tak.ing), but sometimes a segmentation-only analysis is insufficient (e.g., taking may be more appropriately analyzed as take+ing, with a spelling rule accounting for the deletion of the stem-finale). In this paper, we develop a Bayesian model for simultaneously inducing both morphology and spelling rules. We show that the addition of spelling rules improves performance over the baseline morphology-only model.


CRITIQUE

The only aspect of this journal which kept me from enjoying it to the fullest is that it is filled with many confusing technical definition. Although Jason Naradowsky and Sharon Goldwater explain some definition in certain terms.,but in the other terms there are no specific definition. For example : there are not definition from MDLand there are not some definition from NLP if we want to write down critical journal or critical book for the first main idea we write down some definition before.

CONCLUSION

            This Journal have some of a little a fault. Like there are not some explanation definition from MDL and NLP

ELABORATION
Ø  MDL  is Minimum Description Length
            Minimum description length (MDL) (Rissanen 1978) is a technique from algorithmic information theory which dictates that the best hypothesis for a given set of data is the one that leads to the largest compression of the data. We seek to minimize the sum of the length, in bits, of an effective description of the model and the length, in bits, of an effective description of the data when encoded with the help of the model.

Ø  NLP is Natural Languange Process, Natural Language process is the automatic processing of human language.
a.       Morphology
Morphology is the structure of word
b.      Syntax
Syntax is the way words are used to form phrases
c.       Semantic
Semantic devided as compositional semantic and lexical semantic. Compositional semantic is construction of meaning based on syntax. Lexical semantic is the meaning of individual words.
d.      Pragmatics
 Pragmatics is meaning in context

0 Komentar:

Posting Komentar

Berlangganan Posting Komentar [Atom]

<< Beranda