Semi-Supervised Cause Identification from Aviation Safety Reports
Isaac Persing and Vincent Ng.
ACL-IJCNLP 2009: Proceedings of the Main Conference, pp. 843-851, 2009.
Click here for the
PostScript or PDF
version.
The talk slides are available here.
Abstract
We introduce cause identification, a new problem involving classification of
incident reports in the aviation domain.
Specifically, given a set of pre-defined causes, a cause identification
system seeks to identify all and only those causes that can explain why
the aviation incident described in a given report occurred.
The difficulty of cause identification stems in part from the fact that it is
a multi-class, multi-label categorization task,
and in part from the skewness of
the class distributions and the scarcity of annotated reports.
To improve the performance of a cause identification system for the minority
classes, we present a bootstrapping algorithm that automatically augments
a training set by learning
from a small amount of labeled data and a large amount of unlabeled data.
Experimental results show that our algorithm yields a relative error
reduction of 6.3% in F-measure for the minority classes
in comparison to a baseline that learns solely from the labeled data.
Dataset
The cause identification dataset used in this paper is available from
this page.
BibTeX entry
@InProceedings{Persing+Ng:09a,
author = {Isaac Persing and Vincent Ng},
title = {Semi-Supervised Cause Identification from Aviation Safety Reports},
booktitle = {ACL-IJCNLP 2009: Proceedings of the Main Conference},
pages = {843--851},
year = 2009
}