Learning the Fine-Grained Information Status of Discourse Entities
Altaf Rahman and Vincent Ng.
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 798-807, 2012.
Click here for the
PostScript or PDF
version.
The talk slides are available here.
Abstract
While information status (IS) plays a crucial role in discourse processing,
there have only been a handful of attempts
to automatically determine the IS of discourse entities.
We examine a related but more challenging task,
fine-grained IS determination, which involves
classifying a discourse entity as one of 16 IS subtypes.
We investigate the use of rich knowledge sources for this task
in combination with a rule-based approach and a learning-based approach.
In experiments with a set of Switchboard dialogues,
the learning-based approach achieves an accuracy of 78.7%,
outperforming the rule-based approach by 21.3%.
Train-test split
Here are the lists of names of the files from the NXT corpus that we used for training and testing.
BibTeX entry
@InProceedings{Rahman+Ng:12a,
author = {Altaf Rahman and Vincent Ng},
title = {Learning the Fine-Grained Information Status of Discourse Entities},
booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics},
pages = {798--807},
year = 2012
}