GEM - Overview and Features


Graphical Explorer of MPI Programs (GEM), is a graphical front end for In-situ Partial Order (ISP), a dynamic formal verification tool for MPI C/C++ programs, developed at the University of Utah. GEM greatly enhances the usability of ISP by providing an intuitive graphical way to track down bugs in MPI C/C++ programs, examine post-runtime results and to analyze MPI runtime behavior.

GEM can be used by anyone who can write simple MPI C/C++ programs, and requires no special training. GEM allows you to formally verify your MPI C/C++ programs automatically without any extra efforts on your part. No code instrumentation is necessary. Gem will definitively flag the following errors and warnings:

Analyzer View

The Analyzer View is a unique user interface which visually displays the output that ISP generated by highlighting lines in the source file. It shows both the current MPI call, and any matching point-to-point or collective operation. It also allows the user to examine MPI calls for a particular rank with an easy to use Rank Locking feature.

The Analyzer View offers the user the ability to "step" through execution of the code in debug fashion examining the execution of the MPI calls in the program as they happened at runtime. The user can view this in two different orderings:

The source code analyzer also offers a feature to browse calls by rank and by interleaving as well as teh ability to launch the Happens Before Viewer, another visualization tool provided by GEM.



In addition, it helps you understand as well as step through all relevant process interleavings (schedules). Notice our use of the word relevant: even a short MPI program may have too many (an "exponential number") interleavings. For example, an MPI program with five processes, each containing five MPI calls, can have well in excess of 1000 interleavings. However, ISP generates a new interleaving only when it represents a truly new (as yet unexamined) behavior in your MPI program.

As Examples:

Browser View

The Browser View is a highly functional tool that summarizes and categorically groups all of the errors and warnings found by GEM. Below are listings of the errors and warnings summarized by GEM's Browser View, along with a brief description. Each error or warning can be clicked and the developer is taken to the corresponding line in the source code within the Eclipse editor.

At the top left is a brief summary of the results indicating the number of errors and warnings found. In the top right are buttons provided with the ability to set the number of processes and re-run GEM on the current MPI project.



How GEM Works:

GEM employs the use of In-situ Partial Order (ISP), the underlying dynamic formal verification tool, to verify MPI C/C++ programs. Here we describe ISP in detail to provide a better understanding of how it works.

ISP works by intercepting the MPI calls made by the target program and making decisions on when to send these MPI calls to the MPI library. This is accomplished by the two main components of ISP: the Profiler and the Scheduler. An overview of ISP’s components and their interaction with the program as well as the MPI library is provided in Figure 1 below.


Figure 1

Profiler:

The interception of MPI calls is accomplished by compiling the ISP Profiler together with the target program’s source code. The Profiler makes use of MPI’s profiling mechanism (PMPI). Then the Profiler provides its own version of MPI f (replacing the original) for each corresponding MPI function f. Within each of these MPI f , the profiler communicates with the scheduler using TCP sockets to send information about the MPI call the process wants to make. It will then wait for the scheduler to make a decision on whether to send the MPI call to the MPI library or to postpone it until later. When the permission to fire f is given from the scheduler, the corresponding PMPI f will be issued to the MPI run-time. Since all MPI libraries come with functions such as PMPI f for every MPI function f, this approach provides a portable and light-weight instrumentation mechanism for MPI programs being verified [6].

Scheduler:

The ISP Scheduler helps carry out the POE algorithm. As it turns out, the POE algorithm is based on exploiting MPI’s out-of-order completion semantics. In other words,

Description of POE Algorithm:

POE stands for "POR with out-of-order and elusive interleavings"

MPI programs suffer from "elusive" interleavings in the presence of MPI wildcard Receive R(*) . Issuing the MPI sends in different orders does not help as the MPI runtime provides no guarantees based on when the Send is actually issued. POE employs dynamic source re-writing in place of wildcard so that every interleaving explored by POE algorithm is deterministic. Also, the presence of collectives such as MPI\_Barrier can cause traditional runtime verification techniques such as "Dynamic Partial Order Reduction" to fail since it is not always possible to issue instructions in different orders in the presence of collectives. POE also addresses this issue by creating "match sets".

POE algorithm works by creating a graph structure with the MPI operations of processes as nodes and by adding Intra-Completes-Before (IntraCB) edges between these nodes. As the name suggests, IntraCB edges are only added between the MPI operations within the MPI processes. The IntraCB edges are added between the MPI operations based on the MPI semantics of the operations. MPI functions like MPI_Barriers, MPI_Wait etc have the semantics, that no later MPI operations can be issued until these MPI operations complete. We call such operations as fence operations. Note that instructions issued before the fence operations can linger even after the fence operation is complete.

The IntraCB edges are added based on the following rules: Let i and j be the MPI operations of a process P such that i < j (i.e., i comes before j in the program order. There is an IntraCB edge between i and j is one of the following is true:

If there is a path from i to j then i is called the ancestor of j. POE algorithm uses these IntraCB edges to form match-sets to be issued into the MPI runtime. The following is the (informal) algorithm:

An MPI operation is "matched" if it has been issued to the MPI runtime by the POE algorithm.

  1. Run the MPI processes until they all reach fence operations.
  2. Let e denote the MPI operations of all processes such that for each ie, all ancestors of i have been matched.
  3. Let m denote the match-sets formed out of e as follows:
  4. Issue the match sets to MPI runtime in the following priority order: C-Match > W-Match > SR-Match > SR*-Match.
  5. If any of the processes issued a fence operation then goto step 1. Else, goto step 4.
  6. If there are no match sets and all processes have issued MPI_Finalize, check if any wildcard match sets are yet to be explored. If yes, then restart the processes and goto step 1, otherwise, exit.
  7. If there are no match sets and there is at least one process that has not issued MPI_Finalize, then there is a deadlock. Flag a deadlock and kill all the MPI processes.

Obtaining ISP and Examples

ISP can be downloaded from the Gauss Group’s ISP Release Page.

What you will receive are the full sources as well as the following examples in the directory isp-tests:

Why the Trident?

Because It is a universal symbol for "slaying the evil bug"!



References




Back to Top | Back to Table of Contents



School of Computing * 50 S. Central Campus Dr. Rm. 3190 * Salt Lake City, UT 84112 * isp-dev@cs.utah.edu
License