This is an archived version of CCC's website. Please visit the new ccc website for the latest information.

Relevant Links

Link to Original Paper
Contact the STAT Team

Keywords

debugging, supercomputers, parallel processing

Buzz


feed icon

feed icon

feed icon

COMPUTING RESEARCH HIGHLIGHT OF THE WEEK [August 19 - 26, 2011]

Debugging Supercomputers


When scientific applications run on extreme-scale systems, a single fault that disables a small portion of the application can bring the entire execution to a sudden halt, costing machine and programmer time. The Stack Trace Analysis Tool (STAT) developed by Lawrence Livermore National Laboratory, Livermore, Calif., the University of Wisconsin, Madison, and the University of New Mexico, Albuquerque is designed to help developers prevent these small faults from hindering valuable research.

STAT is an open source, scalable debugging tool for identifying errors in code running on supercomputers of 100,000 or more processor cores. The software, built upon a highly scalable, modular, and open source infrastructure, uses a top-down approach and works on the principle of detecting and grouping similar processes at suspicious points in a program's execution. Users can reduce the problem they are trying to debug to only a small and tractable number of processes by picking a representative subset from each group instead of having to debug all processes at the same time.

STAT also automatically identifies outliers by examining the state of each process in a parallel program and extracting the call stacks that led to the current point of execution. This allows the program to relate the state of the processes to each other and map the parallel execution context to the user’s source code.

Researchers:
Dong Ahn, Bronis de Supinski, Greg Lee, Matt Legendre, Martin Schulz (Lawrence Livermore National Laboratory)
Dorian Arnold (University of New Mexico)
Barton Miller (University of Wisconsin)

Institution(s) (that have supported the research):
Lawrence Livermore National Laboratory
University of New Mexico
University of Wisconsin

 

Current Highlight | Past Highlights


Computing Research Highlight of the Week is a service of the Computing Community Consortium and the Computing Research Association designed to highlight some of the exciting and important recent research results in the computing fields. Each week a new highlight is chosen by CRA and CCC staff and volunteers from submissions from the computing community. Want your research featured? Submit it!.