Datasets


: Text Clustering    : NLP Applications    : Opinion and Argumentation Mining    : Coreference Resolution    : Information Extraction
: Morphology and POS Tagging    : Discourse    : AI Planning    : Health Informatics    : Miscellany


Below are the datasets we annotated and used in previous NLP papers for a variety of NLP tasks, including:

       Bengali Morphological Segmentation
          References: Dasgupta & Ng LRE 2006 paper, Dasgupta & Ng NAACL HLT 2007 paper

       Bengali Part-of-Speech Induction
          Reference: Dasgupta & Ng EMNLP 2007 paper

       Stance and Reason Classification in Ideological Debates
          References: Hasan & Ng ACL 2013 short paper, Hasan & Ng CoNLL 2013 paper, Hasan & Ng IJCNLP 2013 paper, Hasan & Ng EMNLP 2014 paper

       Vote Prediction on SodaHead Polls
          Reference: Persing & Ng EMNLP 2014 paper

       Cause Identification
            Documents annotated with shaping factors
              References: Persing & Ng ACL-IJCNLP 2009 paper, Abedin, Ng & Khan JAIR 2010 paper
            Documents annotated with annotator rationales
              Reference: Abedin, Ng & Khan IJCAI 2011 paper

       Essay Grading
            Organization
              Reference: Persing, Davis & Ng EMNLP 2010 paper
            Thesis Clarity
              Reference: Persing & Ng ACL 2013 paper
            Prompt Adherence
              Reference: Persing & Ng ACL 2014 paper
            Argument Strength
              Reference: Persing & Ng ACL-IJCNLP 2015 paper

       Pronoun Resolution
          Reference: Rahman & Ng EMNLP-CoNLL 2012 paper

       Software Requirements Traceability
          Reference: Li et al. CoNLL 2015 paper

       Temporal Relation Annotations in Clinical Notes
          Reference: D'Souza and Ng LREC 2014 paper

       Multi-Clustering
          Reference: Dasgupta & Ng ICML 2010 paper, Dasgupta & Ng SIGIR 2010 paper


Below are the hand-crafted rulesets we created and used in previous NLP papers:

       Temporal Relation Classification
          Reference: D'Souza & Ng NAACL HLT 2013 paper

       Medical Relation Classification
          Reference: D'Souza & Ng COLING 2014 paper


Output from some previous experiments conducted using the above datasets:

       Unsupervised morphological segmentation output

       Unsupervised part-of-speech induction output


See the acknowledgment sections of individual papers for specific funding support information. Any opinions, findings, and conclusions or recommendations expressed in these publications or on this web site are those of the author(s) and do not necessarily reflect the views of the funding agencies.