Jennifer D'Souza

Human Language Technology Research Institute
University of Texas at Dallas
Richardson, TX 75083, USA

Project Listing

  1. Spatial Relation Extraction.
  2. Spatial relation extraction is the task of determining whether a given set of spatial elements are in a qualitative spatial relation (QSLINK), an orientation relation (OLINK), or a motion spatial relation (MOVELINK).

    Our participating system in SpaceEval 2015, which achieved the best spatial relation extraction results, and an enhanced version of our SpaceEval system are described in the papers below.

    Jennifer D'Souza and Vincent Ng. September 2015. Sieve-Based Spatial Relation Extraction with Expanding Parse Trees. In Proceedings of EMNLP.
    Jennifer D'Souza and Vincent Ng. June 2015. UTD: Ensemble-Based Spatial Relation Extraction. In Proceedings of SemEval 2015.

    More details ...

  3. Normalization of Disorder Mentions in the Biomedical Domain.
  4. Normalization is the task of mapping a word or a phrase in a document to a unique concept in an ontology (based on the description of that concept in the ontology) after disambiguating potential ambiguous surface words, or phrases.

    Our sieve-based approach for the normalization of disorder mentions in biomedical data is described in the following paper.

    Jennifer D’Souza and Vincent Ng. July 2015. Sieve-Based Entity Linking in the Biomedical Domain. In Proceedings of ACL-IJCNLP.

    More details ...

  5. Automatic Word Variant Derivation.
  6. The Word Variant Derivation task by suffixation can be described as follows. Given a word, to determine its unchangeable beginning portion (loosely, the base), identifying any remaining ending as replaceable (loosely, the suffix), and potential candidate suffixes for replacement or for appending to the base resulting in valid derived words (e.g., happ-iness from happ-y).

    Our sequence learning approach for the Word Variant Derivation task is described in the following paper.

    Jennifer D'Souza. January 2015. A Sequence Labeling Approach to Deriving Word Variants. In Proceedings of AAAI. Student Abstract and Poster Program.

    More details ...

  7. Biomedical Journal Similarity Metrics.
  8. We have created several novel journal metrics related directly or indirectly to author publication behavior. Our original motivation was to identify different ways of capturing the similarity of two journals, in a manner that will assist us in answering the question: Given any two articles in PubMed that share the same author name (lastname, first initial), how does knowing only the identity of the journals (in which the articles were published) predict the relative likelihood that they are written by the same person vs. different persons?

    Our metrics for capturing journal similarity are described in the following paper.

    Jennifer D’Souza and Neil R. Smalheiser. 2014. Three journal similarity metrics and their application to biomedical journals. In PLoS ONE 9(12): e115681.

    More details ...

  9. Semantic Medical Relation Classification.
  10. Medical Relation Classification, an information extraction task in the clinical domain that was defined in the 2010 i2b2/VA Challenge, involves determining the relation between a pair of medical concepts (problems, treatments, or tests) such as a treatment improves a problem, a test reveals a problem, etc.

    Our ensemble approach to the Medical Relation Classification task is described in the following paper.

    Jennifer D'Souza and Vincent Ng. 2014. Ensemble-Based Medical Relation Classification. In Proceedings of COLING. pp. 1682-1693.

    More details ...

  11. Temporal Relation Classification.
  12. Temporal Relation Classification, one of the most important temporal information extraction tasks, involves classifying a given event-event pair or event-time pair as one of a set of predefined temporal relations such as Before, After, Overlap, etc.

    Our various works towards improving the state-of-the-art in Temporal Relation Classification are described in the following papers.

    Jennifer D’Souza and Vincent Ng. 2014. Knowledge-rich temporal relation identification and classification in clinical notes. Database (2014) Vol. 2014: article ID bau0109; doi: 10.1093/database/bau109.
    Jennifer D’Souza and Vincent Ng. 2014. Annotating Inter-Sentence Temporal Relations in Clinical Notes. In Proceedings of the Ninth Language Resources and Evaluation Conference, pp. 2758-2765.
    Jennifer D'Souza and Vincent Ng. 2013. Temporal Relation Identification and Classification in Clinical Notes. In Proceedings of the Fourth ACM Conference on Bioinformatics, Computational Biology and Biomedicine. pp. 392-401.
    Jennifer D'Souza and Vincent Ng. 2013. Classifying temporal relations in clinical data: A hybrid, knowledge-rich approach. In Journal of Biomedical Informatics, 46(SUPPL.). doi: 10.1016/j.jbi.2013.08.003
    Jennifer D’Souza and Vincent Ng. 2013. Classifying Temporal Relations with Rich Linguistic Knowledge. In Proceedings of NAACL-HLT. pp. 918-927.

    More details ...

  13. Anaphora Resolution.
  14. Anaphora is a linguistic device commonly used in narratives and dialogs to avoid repetitions of phrases in human communication. By definition, an anaphor depends on another phrase, namely its antecedent, for its semantic interpretation. The task of the automatic resolution of anaphors to antecedents is known as anaphora resolution.

    Our hybrid approach for the Anaphora Resolution task is described in the following paper.

    Jennifer D'Souza, and Vincent Ng. 2012. Anaphora resolution in biomedical literature: a hybrid approach. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 113-122. ACM.