Datasets
: Text Clustering   
: NLP Applications   
: Opinion and Argumentation Mining   
: Coreference Resolution   
: Information Extraction
: Morphology and POS Tagging   
: Discourse   
: AI Planning   
: Health Informatics   
: Miscellany
Below are the datasets we annotated and used in previous NLP papers for
a variety of NLP tasks, including:
    
 
Bengali Morphological Segmentation
         
References: Dasgupta & Ng LRE 2006 paper , Dasgupta & Ng NAACL HLT 2007 paper
    
 
Bengali Part-of-Speech Induction
         
Reference: Dasgupta & Ng EMNLP 2007 paper
    
 
Stance and Reason Classification in Ideological Debates
         
References: Hasan & Ng ACL 2013 short paper , Hasan & Ng CoNLL 2013 paper , Hasan & Ng IJCNLP 2013 paper , Hasan & Ng EMNLP 2014 paper
    
 
Vote Prediction on SodaHead Polls
         
Reference: Persing & Ng EMNLP 2014 paper
    
 
Debate Argument Persuasiveness
         
References: Persing & Ng IJCAI 2017 paper , Persing & Ng IJCNLP 2017 paper
    
 
Cause Identification
         
 
Documents annotated with shaping factors
             
References: Persing & Ng ACL-IJCNLP 2009 paper , Abedin, Ng & Khan JAIR 2010 paper
         
 
Documents annotated with annotator rationales
             
Reference: Abedin, Ng & Khan IJCAI 2011 paper
    
 
Essay Grading
         
 
Organization
             
Reference: Persing, Davis & Ng EMNLP 2010 paper
         
 
Thesis Clarity
             
Reference: Persing & Ng ACL 2013 paper
         
 
Prompt Adherence
             
Reference: Persing & Ng ACL 2014 paper
         
 
Argument Strength
             
Reference: Persing & Ng ACL-IJCNLP 2015 paper
         
 
Stance
             
Reference: Persing & Ng ACL 2016 paper
         
 
Argument Persuasiveness
             
Reference: Carlile et al. ACL 2018 paper
         
 
Thesis Strength
             
Reference: Ke et al. ACL 2019 paper
    
 
Pronoun Resolution
         
Reference: Rahman & Ng EMNLP-CoNLL 2012 paper
    
 
Software Requirements Traceability
         
Reference: Li et al. CoNLL 2015 paper
    
 
Temporal Relation Annotations in Clinical Notes
         
Reference: D'Souza and Ng LREC 2014 paper
    
 
Multi-Clustering
         
Reference: Dasgupta & Ng ICML 2010 paper , Dasgupta & Ng SIGIR 2010 paper
Below are the hand-crafted rulesets we created and used in previous NLP papers:
    
 
Temporal Relation Classification
         
Reference: D'Souza & Ng NAACL HLT 2013 paper
    
 
Medical Relation Classification
         
Reference: D'Souza & Ng COLING 2014 paper
Output from some previous experiments conducted using the above datasets:
    
 
Unsupervised morphological segmentation output
    
 
Unsupervised part-of-speech induction output
See the acknowledgment sections of individual papers for specific funding support information. Any opinions, findings, and conclusions or recommendations expressed in these publications or on this web site are those of the author(s) and do not necessarily reflect the views of the funding agencies.