CherryPicker : A Coreference Resolution Tool
All coreference models included in CherryPicker employ linguistic features that are largely motivated by those described in the Ng & Cardie ACL 2002 paper, and were trained using SVMlight on the English portion of the ACE 2005 multilingual training corpus. Since ACE 2005 restricts coreference to noun phrases that belong to one of seven semantic classes (namely, PERSON, ORGANIZATION, GPE (geo-political entity), FACILITY, LOCATION, VEHICLE, and WEAPON), the resulting coreference models will generate coreference chains only for noun phrases belonging to these semantic classes.
CherryPicker also includes a mention detector that was trained using CRF++ on the same training data to identify noun phrases that belong to these seven semantic classes, so there is no need for the user to provide noun phrases as input. For feature generation, CherryPicker relies on the following NLP tools:
1. The Stanford Log-linear Part-Of-Speech Tagger 2. The Stanford Named Entity Recognizer (NER) 3. The Charniak Statistical Syntactic Parser 4. The MINIPAR ParserAll these software tools, as well as SVMlight and CRF++, are included as part of our software package. CherryPicker only assumes as input a text that is sentence-delimited, with one sentence per line, and produces coreference chains in the MUC format. See the README file for details.
CherryPicker may be freely downloaded and used for all educational and research activities, but may not be used for commerical or for-profit purposes. Please acknowledge your use of the software by citing the following paper, which contains the technical ideas behind our cluster-ranking model:
Altaf Rahman and Vincent Ng. Supervised Models for Coreference Resolution. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 968-977, 2009.
As described in the README file, a simple Java command will run the tool. Various options can be tweaked using arguments.