Human Language Technology Research Institute
The University of Texas at Dallas
Current Projects Past Projects

Current Projects

AQUINAS: Answering Questions using Inference and Advanced Semantics
The driving rationale for our approach is to enable access to sophisticated inference mechanisms based on rich semantic structures. The research involves (1) innovations in language analysis; (2) innovations in question processing; (3) new forms of indexing using semantic information; (4) extraction and inference of answers based on event inter-relationships and context-sensitive inference over multiple sentences and discourse fragments; and (5) learning techniques for abductive reasoning.

In this collaborative project between UTD, ICSI Berkeley and Stanford University, several stages of deeper semantic processing are considered. A first step in this direction is the incorporation of "semantic parsers" or identifiers of predicate-argument structures or semantic frames. Second, complex semantic structures are evoked by the question in order to retrieve candidate answers. Indexing and retrieval models are enhanced by taking into account conceptual schemas and topic models.

PI: Dr. Sanda Harabagiu (UTD)
Co-PIs: Dr. Srini Narayanan (ICSI Berkeley)
  Dr. Chris Manning (Stanford University)
NSF CADRE: A Tool for Transforming WordNet into a Core Knowledge Base
This project extends a popular database of English words to make it more useful in such tasks as question answering, information retrieval, and summarization. Wordnet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a variety of practical applications. The basic elements of WordNet are sets of words that are linked according to semantic relations: synonomy, antonymy, superordination, and so forth. WordNet is publicly available, widely used, and is currently being expanded into a multilingual database.

This project develops a set of tools that can be applied to current and future versions of WordNet to extend it for knowledge processing applications. The extensions are enhancements of the glosses that currently contain definitions, comments, and examples of sets of words that are linked in WordNet. Enhanced glosses are syntactically parsed, will have each word tagged with its part of speech, and will themselves be linked with other glosses that describe related concepts. This research is sponsored by the National Science Foundation.

PI: Dr. Dan Moldovan, Co-PI: Dr. Sanda Harabagiu

Project webpage:
NSF CAREER: Reference Resolution for Natural Language Understanding
A major obstacle in building robust systems that extract and interpret information, and summarize and answer questions from texts, is the need to identify the entities referred to by pronouns or other referential expressions. This project extends the PI's prior work involving the development of an empirical reference resolution system that relies on several sets of heuristics that correspond to various forms of reference. In particular, the framework will be extended to learn semantic knowledge that supports consistency checks. This enhancement will provide high precision reference resolution and also enhance substantially the recall of referential links. The research will be evaluated using referenceannotated texts and the Penn Treebank corpora. The outcome will be a corpus-based method for reference resolution for both pronouns and nominal expressions. First, the semantics of all referential noun phrases will be captured. Then, by extending the empirical environment with bootstrapping, this reference resolution technique should lead to a powerful tool capable of resolving reference correctly in a large variety of texts. Finally, the tool will be incorporated both in an information extraction system and in a question/answering system, to measure its contribution to the overall performance of these systems. The proposed research departs from previous approaches to reference resolution, in that it promotes data-driven techniques instead of relying on combinations of linguistic and cognitive aspects of language. The immediate pragmatic outcome indicated by the preliminary results should be a substantial recall enhancement. This research is sponsored by the National Science Foundation.

PI: Dr. Sanda Harabagiu
NSF ITR: Adaptive Protocols for a Distributed Java Virtual Machine
Java is a language of growing importance but so far parallelism in Java has been limited to either multi-threading on symmetric multiprocessors (SMP) or distributed computing using Remote Method Invocation (RMI). The number and size of Java-based Internet-related applications require more and more parallelism and system scalability. This proposal addresses the problem of designing memory consistency protocols for a distributed Java Virtual Machine capable of self-adapting at runtime to different application characteristics.

The work under this project is divided into five tasks: (1) the definition of the memory consistency model, (2) the development of consistency protocols, (3) the definition of an analytical performance model on which adaptive protocols are based, (4) the development of processor allocation algorithms for load balancing, and (5) the evaluation of the system performance using four classes of applications. This research is sponsored by the National Science Foundation.

PI: Dr. Dan Moldovan
NSF ITR ARCADE (Automatized Reading Comprehension and Diagnostic Evaluation)
NSF ITR ARCADE (Automatized Reading Comprehension and Diagnostic Evaluation) This project is a feasibility study to develop advanced artificial intelligence techniques and utilize research from the field of Cognitive Psychology for the purposes of assessing reading comprehension of children in elementary and junior high school. The ARCADE system works by having children logon to a particular web site. The children are asked to read narrative stories and science texts and are asked to type answers to some questions about the texts. The children are encouraged to type their answers in any format they choose. Such "essay responses" are important since they can reveal aspects of children's thinking styles which may not be uncovered in standardized testing. ARCADE then automatically groups children with similar thinking styles together and provides the classroom teacher with suggested customized group-specific teaching strategies for each group. Thus, the ARCADE system is designed to improve the quality of reading comprehension instruction for all children in the classroom. This research is sponsored by the National Science Foundation.

PI: Dr. Richard Golden

AboutDirectionsSite MapContacts
© 2005 Human Language Technology Research Institute  
eXtended WordNet downloads