EECS500 Spring 2013 Department Seminar

Razvan Bunescu
Machine learning approaches to word sense disambiguation and (co)reference resolution
University of Ohio
White Bldg., Room 411
11:30am - 12:30pm
March 5, 2013

Word sense disambiguation and (co)reference resolution are two
fundamental tasks in natural language processing that have seen many
advances over the years and that continue to attract significant
research effort. The two tasks can be seen as instantiations of a more
general problem -- linking noun phrases to sets of potential referents
- -- at different granularity levels. The focus of this presentation
will be on machine learning approaches for the two tasks. In the first
part of the talk, I will describe an approach to training coarse to
fine grained sense disambiguation systems that uses Wikipedia links as
supervision. In Wikipedia, links to general senses of a word are used
concurrently with links to more specific senses, without being
distinguished explicitly. The large scale presence of such annotation
inconsistencies motivates the use of models that can learn only from
positive and unlabeled examples. In the second part of the talk, I
will present an adaptive clustering algorithm for coreference
resolution that integrates the deterministic rules from a recent
state-of-the-art system with semantic compatibility constraints
derived from Web n-gram statistics. The new approach allows for a more
flexible incorporation of features and is shown to improve coreference
resolution performance, especially with respect to pronouns.


Razvan Bunescu is an Assistant Professor of Computer Science at Ohio
University. His research interests include machine learning,
computational linguistics, and biomedical informatics. He graduated
from the University of Texas at Austin in 2007 with a PhD thesis on
machine learning methods for information extraction. Since then, a
major focus of his research has been on exploiting large scale weakly
structured collections of documents for natural language processing
applications. More recently, in collaboration with a group of diabetes
experts, he has been working on machine learning models for medical
informatics tasks such as blood glucose level prediction and glycemic
variability detection. His research has been funded by grants from the
National Science Foundation and Ohio University.