This paper compares two alternative approaches to the problem of acquiring named-entity recognition and classification systems from training corpora, in two different languages. The process of named-entity recognition and classification is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, effective methods to acquire such systems automatically from data are very desirable. In this paper we compare two popular learning methods on this task: a decision-tree induction method and a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate data representation for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on large corpora of English and Greek texts and present the results. In addition to the good performance of both methods, one very interesting result is the fact that a simple representation of the data, which ignores the order of the words within a named entity, leads to improved results over a more complex approach that preserves word order.
Keygen Architecture, Engineering Construction Collection 2010
Named-entityrecognition involves the identification and classification of named entities intext. This is an important subtask in most language engineering applications,in particular information extraction, where different types of named entity areassociated with specific roles in events. The manual construction of rules forthe recognition of named entities is a tedious and time-consuming task. Forthis reason, we present in this paper two approaches to learning named-entityrecognition rules from text. The first approach is a decision-tree inductionmethod and the second a multi-layered feed-forward neural network. Particularemphasis is paid on the selection of the appropriate feature set for eachmethod and the extraction of training examples from unstructured textual data.We compare the performance of the two methods on a large corpus of English textand present the results.
Named-entity recognition (NER) involves the identification and classification of named entities in text. This is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. In this paper, we present a prototype NER system for Greek texts that we developed based on a NER system for English. Both systems are evaluated on corpora of the same domain and of similar size. The time-consuming process for the construction and update of domain-specific resources in both systems led us to examine a machine learning method for the automatic construction of such resources for a particular application in a specific language. 2ff7e9595c
Comments