CherryPicker : A Coreference Resolution Tool

About . Download . Requirements . Release History . Questions


CherryPicker is a coreference resolution tool that implements our recently-developed cluster-ranking model as well as two existing learning-based coreference models (the mention-pair model and the mention-ranking model). Cluster rankers aim to address the major weaknesses of the widely-investigated mention-pair model, and have empirically been shown to surpass the performance of both the mention-pair model and the mention-ranking model by a large margin. Since cluster rankers offer substantially higher precision than existng coreference models, we believe that they can be beneficially used in many high-level NLP applications.

All coreference models included in CherryPicker employ linguistic features that are largely motivated by those described in the Ng & Cardie ACL 2002 paper, and were trained using SVMlight on the English portion of the ACE 2005 multilingual training corpus. Since ACE 2005 restricts coreference to noun phrases that belong to one of seven semantic classes (namely, PERSON, ORGANIZATION, GPE (geo-political entity), FACILITY, LOCATION, VEHICLE, and WEAPON), the resulting coreference models will generate coreference chains only for noun phrases belonging to these semantic classes.

CherryPicker also includes a mention detector that was trained using CRF++ on the same training data to identify noun phrases that belong to these seven semantic classes, so there is no need for the user to provide noun phrases as input. For feature generation, CherryPicker relies on the following NLP tools:

        1. The Stanford Log-linear Part-Of-Speech Tagger
        2. The Stanford Named Entity Recognizer (NER)
        3. The Charniak Statistical Syntactic Parser
        4. The MINIPAR Parser
All these software tools, as well as SVMlight and CRF++, are included as part of our software package. CherryPicker only assumes as input a text that is sentence-delimited, with one sentence per line, and produces coreference chains in the MUC format. See the README file for details.

CherryPicker may be freely downloaded and used for all educational and research activities, but may not be used for commerical or for-profit purposes. Please acknowledge your use of the software by citing the following paper, which contains the technical ideas behind our cluster-ranking model:

      Altaf Rahman and Vincent Ng.
      Supervised Models for Coreference Resolution.
      Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 968-977, 2009.


The current version of CherryPicker has only been tested on Unix/Linux machines. Since some of the software tools on which it relies run on Unix/Linux machines only, we do not expect CherryPicker to be able to run on other platforms. Additional requirements are described in the README file.


Download CherryPicker Version 1.01

As described in the README file, a simple Java command will run the tool. Various options can be tweaked using arguments.

Release History

Version Date Description
0.1 09-09-2009 First Release.
0.12 11-07-2009 Bug fixed.


For instructions to run the tool please refer to README file. Please note that CherryPicker is provided "as-is": the software does not come with any warranty or guarantee of any kind and may not be further distributed. However, you are welcome to send your questions/concerns/comments/complaints/bug reports to . We will create an FAQ page if we receive a sufficient number of questions.