I am a PhD student at the University of Texas at Dallas (UTD) in the Human Language Technology Research Institute (HLTRI) under the guidance of Dr. Sanda Harabagiu.
My work typically falls within the intersection of natural language processing (NLP), information retrieval (IR), and medical informatics.
As for my educational history, I graduated from Highland Park Public Highschool in 2007.
I then completed my Bachelor of Science in Computer Science at the University of Texas at Dallas in 2011.
From there, I went on to complete my Master of Science in Computer Science at the University of Texas at Dallas in 2013.
The advent of electronic medical records (EMRs) has sparked a wide interest in using clinical data for research.
However, although some data is easily retrieved from structured fields within these EMRs, the majority of information is locked within free-text fields.
These free-text portions of EMRs contain a wide variety of medical knowledge, such as patient history, physical exam finds, radiology reports, operative notes, discharge summaries, and lab reports.
Unlocking the knowledge encoded within these free-text EMRs requires a combination of natural language processing, information retrieval, and bioinformatics.
This project focuses on the problem of identifying EMRs relevant to certain hospital patient cohorts (groups of hospital patients sharing certain attributes).
To this end, we produced a system that participated in the TRECMed EMR-retrieval track.
Participants in this evaluation were given a set of EMRs from the University of Pittsburgh BLU-Lab NLP Repository as well as a mapping organizing each EMR into its associated patient's visit to the hospital.
These hospital visits varied from consiting of a single EMR to consisting of up to four hundred and fifteen EMRs.
We were additionally provided with a set of training and testing topics based on a list of priority areas for research authored by the Institute of Medicine.
Each topic targets certain patient cohorts and is designed to find a population for which comparative effectiveness studies may be performed.
The purpose of our retrieval system is to return a ranked list of hospital visits that satisfy the requirements expressed by each patient cohort "topic".
Inducing Qualified Medical Knowledge
Electronic Medical Records (EMRs) encode an extraordinary amount of medical knowledge.
Unfortunately, collecting and interpreting this knowledge belies a significant level of clinical understanding.
Although most effort on this front has been regarding medical ontologies, these ontologies were designed primarily for conceptual organization, rather than automated semantic reasoning of language.
Moreover, clinical text often contains an extraordinary amount of variation regarding the author’s belief state – whether something is present, uncertain, or absent.
In this project, we automatically constructed a graph of semantic dependencies between medical concepts qualified by their belief state in EMRs.
Additionally, our representation can be viewed as a Markov network allowing probabilistic inference for performing patient-centered medical treatment or test recommendation.
We are also investing techniques for smoothing the likelihood of observing medical concepts across semantically similar belief states.
Causal reasoning -- the ability to recognize relationships between causes and effects, has been studied since the time of Aristotle.
Indeed, our ability to understand cause and effect is paramount to our ability to make informed decisions and inferences about past and future events.
Automated causal reasoning has been one of the key goals for Artifical Intelligence research since its inception.
Our goal with this project is to advance the state of automated causal reasoning by learning the causal structure encoded in natural language texts.
For this purpose, we have produced a system for the Choice of Plausible Alternatives (COPA) task in SemEval-2012.
This task presents a series of binary questions wherein each question provides a premise and two plausible scenarios that either caused, or resulted from the premise.
The correct answer is the cause or effect that is the most plausible, or reasonable.
We approached this task by casting it as a classification problem and incorporated features derived from bigram co-occurrences, TimeML temporal links between events, single-word polarities as described by the Harvard General Inquirer, and patterns of causal syntactic dependency structures within the gigaword corpus.