C-3: Coherence and Coreference Corpus

Download: c-3-v1.0.zip (438 kB / May 18, 2010)

The C-3 coreference annotation described in (Nicolae et al., 2010) was done on the 135 text files of Discourse GraphBank (Wolf et al., 2003) and is provided separately from them. Each annotation file in this package is numbered to correspond to a Discourse GraphBank file. Statistics about the C-3 annotation can be found in c-3-v1.0-stats.txt.

You can obtain Discourse GraphBank from LDC at the following link: www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T08.

For easy browsing of the C-3 annotation, please use the gann annotation tool available at www.hlt.utdallas.edu/~gabriel/gann.
You need to copy the Discourse GraphBank text files (available from LDC at the above link) and the C-3 annotation .gann files (downloaded from this page) in the same folder. After that, to see an annotation for a Discourse GraphBank file you need to open the file in gann. The tool will automatically locate the corresponding .gann file in the same folder and open it.

Contact: cristina ate\e hlt.utdallas.edu

 

References

(1) C-3 description. If you are using this annotation or the annotation tool in your work, please cite:

Cristina Nicolae, Gabriel Nicolae and Kirk Roberts. 2010. C-3: Coherence and Coreference Corpus. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), 19-21 May, Valletta, Malta.

(2) Discourse GraphBank description.

Florian Wolf, Edward Gibson, Amy Fisher and Meredith Knight. 2003. A procedure for collecting a database of texts annotated with coherence relations. Massachusetts Institute of Technology.