Relevant Links
Link to Original Paper
Contact the STAT Team
Keywords
debugging, supercomputers, parallel processing
Buzz
COMPUTING RESEARCH HIGHLIGHT OF THE WEEK [August 19 - 26, 2011]
Debugging Supercomputers
When scientific applications run on extreme-scale systems, a single fault that disables a small portion of the application can bring the entire execution to a sudden halt, costing machine and programmer time. The Stack Trace Analysis Tool (STAT) developed by Lawrence Livermore National Laboratory, Livermore, Calif., the University of Wisconsin, Madison, and the University of New Mexico, Albuquerque is designed to help developers prevent these small faults from hindering valuable research.
STAT is an open source, scalable debugging tool for identifying errors in code running on supercomputers of 100,000 or more processor cores. The software, built upon a highly scalable, modular, and open source infrastructure, uses a top-down approach and works on the principle of detecting and grouping similar processes at suspicious points in a program's execution. Users can reduce the problem they are trying to debug to only a small and tractable number of processes by picking a representative subset from each group instead of having to debug all processes at the same time.
STAT also automatically identifies outliers by examining the state of each process in a parallel program and extracting the call stacks that led to the current point of execution. This allows the program to relate the state of the processes to each other and map the parallel execution context to the user’s source code.
Researchers:
Dong Ahn, Bronis de Supinski, Greg Lee, Matt Legendre, Martin Schulz (Lawrence Livermore National Laboratory)
Dorian Arnold (University of New Mexico)
Barton Miller (University of Wisconsin)
Institution(s) (that have supported the research):
Lawrence Livermore National Laboratory
University of New Mexico
University of Wisconsin
‹ Current Highlight | Past Highlights ›
Computing Research Highlight of the Week is a service of the Computing Community Consortium and the Computing Research Association designed to highlight some of the exciting and important recent research results in the computing fields. Each week a new highlight is chosen by CRA and CCC staff and volunteers from submissions from the computing community. Want your research featured? Submit it!.