Semi-Supervised Cause Identification from Aviation Safety Reports

Isaac Persing and Vincent Ng.
ACL-IJCNLP 2009: Proceedings of the Main Conference, pp. 843-851, 2009.

Click here for the PostScript or PDF version. The talk slides are available here.

Abstract

We introduce cause identification, a new problem involving classification of incident reports in the aviation domain. Specifically, given a set of pre-defined causes, a cause identification system seeks to identify all and only those causes that can explain why the aviation incident described in a given report occurred. The difficulty of cause identification stems in part from the fact that it is a multi-class, multi-label categorization task, and in part from the skewness of the class distributions and the scarcity of annotated reports. To improve the performance of a cause identification system for the minority classes, we present a bootstrapping algorithm that automatically augments a training set by learning from a small amount of labeled data and a large amount of unlabeled data. Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data.

Dataset

The cause identification dataset used in this paper is available from this page.

BibTeX entry

@InProceedings{Persing+Ng:09a,
  author = {Isaac Persing and Vincent Ng},
  title = {Semi-Supervised Cause Identification from Aviation Safety Reports},
  booktitle = {ACL-IJCNLP 2009: Proceedings of the Main Conference},
  pages = {843--851},
  year = 2009
}