Simply thinking about the number of different runnings of parallel programs can be daunting. The simple example in the ISP website is enlightening:
It is well known that even an MPI program with five process, each making five MPI calls each, has over 10 billion interleavings (schedules). This is what bogs testing tools down -- they do not know which interleavings matter, thus end up missing some interleavings while pursuing others redundantly.
We had a discussion with my friend from mathematics department, Hakan Özadam, about the number of interleavings. Five processes, making five calls each to MPI functions means 25 function calls. For each process, you have to choose 5 function calls. If you number each from 1 to 25, you can have each process choose its functions and move on to the next one. Thus, the number becomes:
C(25/5) * C(20/5) * C(15/5) * C(10/5)
Where
C(x/y) = x! / ((x - y)! * y!) The last process simply has to select all the left calls for itself, and thus its number of choices is C(5/5) = 1, only one.That makes the number quite high. No wonder it is hard to formally verify that a parallel program just doesn't deadlock somewhere and wait indefinitely for an input it is to process (halting problem, anyone?). ISP debugger looks promising on finding certain types of bugs in MPI programs. It seems to have formal framework for following promising interleavings and not checking ones that are sure not to be executed. I wonder how it could be used with Boost libraries, which are the libraries I am using for C++ MPI development.
[thanks to Hakan for his help ]



