Inducing Fine-Grained Semantic Classes via Hierarchical and Collective Classification

Altaf Rahman and Vincent Ng.
Proceedings of the 23rd International Conference on Computational Linguistics, pp. 931-939, 2010.

Click here for the PostScript or PDF version. The talk slides are available here.

Abstract

Research in named entity recognition and mention detection has typically involved a fairly small number of semantic classes, which may not be adequate if semantic class information is intended to support natural language applications. Motivated by this observation, we examine the under-studied problem of semantic subtype induction, where the goal is to automatically determine which of a set of 92 fine-grained semantic classes a noun phrase belongs to. We seek to improve the standard supervised approach to this problem using two techniques: hierarchical classification and collective classification. Experimental results demonstrate the effectiveness of these techniques, whether or not they are applied in isolation or in combination with the standard approach.

Train-test split

Here are the lists of names of the 200 files from the BBN Pronoun Coreference Corpus (LDC2005T33) that we used for training and testing. Note that we only used those files in LDC2005T33 that have corresponding .sense files in the LDC2008T04 corpus.

BibTeX entry

@InProceedings{Rahman+Ng:10a,
  author = {Altaf Rahman and Vincent Ng},
  title = {Inducing Fine-Grained Semantic Classes via Hierarchical and Collective Classification},
  booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics},
  pages = {931--939},
  year = 2010
}