Last modified
November 25, 2002

  Seminar Abstract
Center for Data Intensive Computing


 
 


 

Distributed Dynamic Correctness Testing of MPI Programs

Debugging MPI applications can be difficult. Software complexity, data races, and scheduling dependencies can make simple programming errors very difficult to locate with manual debugging techniques. Worse, few debugging tools are even targeted to MPI abstractions and error messages from MPI implementations are often misleading when the programmer uses MPI incorrectly. As a result, users rely on a spectrum of time-consuming and complicated ad-hoc techniques to locate MPI programming errors. Clearly, MPI programmers need tools that simplify the code development process.

Umpire is an innovative tool that dynamically analyzes any MPI
application for typical MPI programming errors. Examples of these errors include resource exhaustion and configuration-dependent buffer deadlock. Umpire performs this analysis on unmodified application codes at runtime by using the MPI profiling layer.

This talk presents a distributed memory version of Umpire that
uses techniques that are similar to those employed in high-performance multi-threaded MPI implementations. This version of Umpire has identified several MPI programming errors in Sphinx, a widely-available MPI benchmark suite and initial performance results with several applications are promising. This talk will present the distributed design and preliminary results as well as key issues for identifying complex MPI programming errors, such as deadlocks involving MPI_Recv with MPI_ANY_SOURCE.




 


























Top of Page

   

 




Copyright © 1999 Brookhaven National Laboratory ALL RIGHTS RESERVED
Comments/Sugestions about this site contact: Webmaster