EECS500 Fall 2012 Department Seminar

Barry Rountree
MegaWatts and ExaFLOPS: Counterintuitive lessons for performance optimization of one billion processor cores under a power bound
Lawrence Livermore National Laboratory
White Bldg., Room 411
11:30am - 12:30pm
September 4, 2012

The Sequoia supercomputer at Lawrence Livermore National Laboratory is currently the fastest machine on the planet with sustained performance on the order of 0.016 exaflops* at 7.89 megawatts (MW)**.  My research focuses on supercomputer design two generations forward:  how to build a 1-exaflop machine that uses only 20MW (a 62x increase in performance with only a 2.5 increase in power).  In this talk I will discuss treating power as a schedulable resource and present results from my recent experiments on three large Intel Sandy Bridge clusters at LLNL, which is the first published work to leverage Intel's Runtime Average Power Limit (RAPL) in a supercomputing environment.  Drawing from my dissertation work I will show how RAPL-like power scheduling along the critical path of execution leads to near-optimal efficiency, with the power savings allowing larger numbers of compute nodes to be brought online, resulting either in faster execution of existing scientific simulations or making even larger simulations feasible.

*An exaflop is 10^18 floating point operations per second, or 10^9 gigaflops.  To put this in perspective, a GeForce GTX 590 manages ~2500 gigaflops at ~350 watts.  An exaflop system composed of only these processors (no RAM, no interconnects) would require ~140MW.

**A megawatt (MW) is a million watts (instantaneous power).  By way of comparison, the Davis-Besse nuclear power station in Oak Harbor, Ohio generates 889MW.


Barry Rountree received his BA in Theater from Ohio University, his MS in Computer and Network Administration from Florida State and his Ph.D. from the University of Arizona (advised by David K. Lowenthal).  His postdoctoral research at Lawrence Livermore National Laboratory has ranged from continuing his dissertation work on supercomputing performance optimization under power and energy bounds as well as low-level cache optimization, tool parallelization (including a scalable parallelized version of the valgrind memcheck tool) and electrical grid optimization.  At LLNL he has mentored students from Virginia Tech, UIUC, Purdue, UCSD, UArizona and UWisconsin-Madison.  Current non-academic collaborators include Intel, ISO New England, CA-ISO and the TVA.