Learning the Fine-Grained Information Status of Discourse Entities

Altaf Rahman and Vincent Ng.
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 798-807, 2012.

Click here for the PostScript or PDF version. The talk slides are available here.

Abstract

While information status (IS) plays a crucial role in discourse processing, there have only been a handful of attempts to automatically determine the IS of discourse entities. We examine a related but more challenging task, fine-grained IS determination, which involves classifying a discourse entity as one of 16 IS subtypes. We investigate the use of rich knowledge sources for this task in combination with a rule-based approach and a learning-based approach. In experiments with a set of Switchboard dialogues, the learning-based approach achieves an accuracy of 78.7%, outperforming the rule-based approach by 21.3%.

Train-test split

Here are the lists of names of the files from the NXT corpus that we used for training and testing.

BibTeX entry

@InProceedings{Rahman+Ng:12a,
  author = {Altaf Rahman and Vincent Ng},
  title = {Learning the Fine-Grained Information Status of Discourse Entities},
  booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics},
  pages = {798--807},
  year = 2012
}