EECS500 Spring 2014 Distinguished Lecture (CANCELLED)

Tandy Warnow
New HMM-based methods for Ultra-large Alignment and Phylogeny Estimation
The University of Texas at Austin
White Bldg., Room 411
11:30am - 12:30pm
March 6, 2014

Multiple sequence alignment of datasets containing many thousands of sequences is a challenging problem with applications in phylogeny estimation, protein structure and function prediction, taxon identification of metagenomic data, etc. However, few methods can analyze large datasets, and none have been shown to have good accuracy on datasets with more than about 10,000 sequences, especially if the sequence datasets have evolved with high rates
of evolution.

In this talk, I will present a new method to obtain highly accurate estimations of large-scale multiple sequence alignments and phylogenies.  The basic idea is to use a family of Hidden Markov Models (HMMs) to represent a "seed alignment", and then align all the remaining sequences to the seed alignment. Our method, UPP, returns very accurate alignments, and trees on these alignments are also very accurate - even on datasets with as many as 1,000,000 sequences. Furthermore, UPP is both fast and very scalable, so that the analysis of the 1-million taxon dataset took only 24 hours using 12 cores and small amounts of memory.  Finally, this "HMM Family" technique can also be used for other machine learning problems, including taxon identication of metagenomic data.


Tandy Warnow is the David Bruton Jr. Centennial Professor of Computer Sciences at the University of Texas at Austin.  Her research combines mathematics, computer science, and statistics to develop improved models and algorithms for reconstructing complex and large-scale evolutionary histories in biology and historical linguistics.  Tandy received her PhD in Mathematics at UC Berkeley under the direction of Gene Lawler, and did postdoctoral training with Simon Tavare and Michael Waterman at USC.  Her awards include the NSF Young Investigator Award (1994), the David and Lucile Packard Foundation Award (1996), a Radcliffe Institute Fellowship (2006), and a Guggenheim Fellowship (2011).  She served as the Chair of the BDMA Study Section at NIH (2010-2012), and was the lead program director for BIG DATA at NSF (2012-2013).