IBM Books

Hitchhiker's Guide


Appendix B. MPI Safety

This appendix provides information on creating a safe MPI program. Much of the information presented here comes from MPI: A Message-Passing Interface Standard, Version 1.1, available from the University of Tennesee.


Safe MPI Coding Practices

What's a Safe Program?

This is a hard question to answer. Many people consider a program to be safe if no message buffering is required for the program to complete. In a program like this, you should be able to replace all standard sends with synchronous sends, and the program will still run correctly. This is considered to be a conservative programming style, which provides good portability because program completion doesn't depend on the amount of available buffer space.

On the flip side, there are many programmers that prefer more flexibility and use an unsafe style that relies, at least somewhat, on buffering. In such cases, the use of standard send operations can provide the best compromise between performance and robustness. MPI attempts to supply sufficient buffering so that these programs will not result in deadlock. The buffered send mode can be used for programs that require more buffering, or in situations where you want more control. Since buffer overflow conditions are easier to diagnose than deadlock, this mode can also be used for debugging purposes.

Non-blocking message passing operations can be used to avoid the need for buffering outgoing messages. This prevents deadlock situations due to a lack of buffer space, and improves performance by allowing computation and communication to overlap. It also avoids the overhead associated with allocating buffers and copying messages into buffers.

Safety and Threaded Programs

Message passing programs can hang or deadlock when one task waits for a message that is never sent, or when each task is waiting for the other to send or receive a message. Within a task, a similar situation can occur when one thread is waiting for another to release a lock on a shared resource, such as a piece of memory. If the waiting thread is holding a lock that is needed by the running thread, then both threads will deadlock while waiting for the same lock (mutex).

A more subtle problem occurs when two threads simultaneously access a shared resource without a lock protocol. The result may be incorrect without any obvious sign. For example, the following function is not thread-safe, because the thread may be pre-empted after the variable c is updated, but before it is stored.

int c;  /* external, used by two threads */
void update_it()
 {
     c++;  /* this is not thread safe */
 {

It is recommended that you don't write threaded message passing programs until you are familiar with writing and debugging threaded, single-task programs.
Note:While the signal handling library (libmpi.a) supports both the MPI standard, as well as MPL (the message passing interface provided by IBM before the MPI standard was adopted), the threaded library (libmpi_r.a) supports only the MPI standard.

Using Threaded Programs with Non-Thread-Safe Libraries

A threaded MPI program must meet the same criteria as any other threaded program; it must avoid using non-thread-safe functions in more than one thread (for example, strtok). In addition, it must use only thread-safe libraries, if library functions are called on more than one thread. In AIX, not all the libraries are thread-safe, so you should carefully examine how they are used in your program.

Linking with Libraries Built with libc.a

Compiling a threaded MPI program will cause the libc_r.a library to be used to resolve all the calls to the standard C library. If your program links with a library that has been built using the standard C library, it is still usable (assuming that it provides the necessary logical thread safety) under the following conditions:

When your executable is loaded, the loader resolves shared library references using the LIBPATH environment variable first, then the libpath string of the executable itself. POE sets the LIBPATH variable to select the correct message passing library (User Space or IP). The mpcc_r (as well as mpCC_r and mpxlf_r) script sets the libpath string to:

/usr/lpp/ppe.poe/lib/threads:/usr/lpp/ppe.poe/lib: ...

so that POE versions of libc.a and libc_r.a will be used. If these libraries are not available, you'll need to set LIBPATH to point to the ones you want to use.

Some General Hints and Tips

To ensure you have a truly MPI-based application, you need to conform to a few basic rules of point-to-point communication. In this section, we'll alert you to some of the things you need to pay attention to as you create your parallel program. Note that most of the information in this section was taken from MPI: A Message Passing Interface Standard, so you may want to refer to this document for more information.

Order

With MPI, it's important to know that messages are non-overtaking; the order of sends must match the order of receives. Assume a sender sends two messages (Message 1 and Message 2) in succession, to the same destination, and both match the same receive. The receive operation will receive Message 1 before Message 2. Likewise, if a receiver posts two receives (Receive 1 and Receive 2), in succession, and both are looking for the same message, Receive 1 will receive the message before Receive 2. Adhering to this rule ensures that sends are always matched with receives.

If a process in your program has a single thread of execution, then the sends and receives that occur follow a natural order. However, if a process has multiple threads, the various threads may not execute their relative send operations in any defined order. In this case, the messages can be received in any order.

Order rules apply within each communicator. Weakly synchronized threads can each use independent communicators to avoid many order problems.

Here's an example of using non-overtaking messages. Note that the message sent by the first send must be received by the first receive, and the message sent by the second send must be received by the second receive.

CALL MPI_COMM_RANK(comm, rank ierr)
IF (rank.EQ.0) THEN
    CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag, comm, ierr)
    CALL MPI_BSEND(buf2, count, MPI_REAL, 1, tag, comm, ierr)
ELSE    ! rank.EQ.1
    CALL MPI_RECV(buf1, count, MPI_REAL, 0, MPI_ANY_TAG, comm, status, ierr)
    CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag, comm, status, ierr)
END IF

Progress

If two processes initiate two matching sends and receives, at least one of the operations (the send or the receive) will complete, regardless of other actions that occur in the system. The send operation will complete unless its matching receive has already been satisfied by another message, and has itself completed. Likewise, the receive will complete unless its matching send message is claimed by another matching receive that was posted at the same destination.

The following example shows two matching pairs that are intertwined in this manner. Here's what happens:

  1. Both processes invoke their first calls.

  2. process 0's first send indicates buffered mode, which means it must complete, even if there's no matching receive. Since the first receive posted by process 1 doesn't match, the send message gets copied into buffer space.

  3. Next, process 0 posts its second send operation, which matches process 1's first receive, and both operations complete.

  4. process 1 then posts its second receive, which matches the buffered message, so both complete.

CALL MPI_COMM_RANK(comm, rank, ierr)
IF (rank.EQ.0) THEN
    CALL MPI_BSEND(buf1, count, MPI_REAL, 1, tag1, comm, ierr)
    CALL MPI_SSEND(buf2, count, MPI_REAL, 1, tag2, comm, ierr)
ELSE    ! rank.EQ.1
    CALL MPI_RECV(buf1, count, MPI_REAL, 0, tag2, comm, status, ierr)
    CALL MPI_RECV(buf2, count, MPI_REAL, 0, tag1, comm, status, ierr)
END IF

Fairness

MPI does not guarantee fairness in the way communications are handled, so it's your responsibility to prevent starvation among the operations in your program.

So what might unfairness look like? An example might be a situation where a send with a matching receive on another process doesn't complete because another message, from a different process, overtakes the receive.

Resource Limitations

If a lack of resources prevents an MPI call from executing, errors may result. Pending send and receive operations consume a portion of your system resources. MPI attempts to use a minimal amount of resource for each pending send and receive, but buffer space is required for storing messages sent in either standard or buffered mode when no matching receive is available.

When a buffered send operation cannot complete because of a lack of buffer space, the resulting error could cause your program to terminate abnormally. On the other hand, a standard send operation that cannot complete because of a lack of buffer space will merely block and wait for buffer space to become available or for the matching receive to be posted. In some situations, this behavior is preferable because it avoids the error condition associated with buffer overflow.

Sometimes a lack of buffer space can lead to deadlock. The program in the example below will succeed even if no buffer space for data is available.

CALL MPI_COMM_RANK(comm, rank, ierr)
IF (rank.EQ.0) THEN
    CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)
    CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)
ELSE    ! rank.EQ.1
    CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)
    CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)
END IF

In this next example, neither process will send until other the process sends first. As a result, this program will always result in deadlock.

CALL MPI_COMM_RANK(comm, rank, ierr)
IF (rank.EQ.0) THEN
    CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)
    CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)
ELSE    ! rank.EQ.1
    CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)
    CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)
END IF

The example below shows how message exchange relies on buffer space. The message send by each process must be copied out before the send returns and the receive starts. Consequently, at least one of the two messages sent needs to be buffered in order for the program to complete. As a result, this program can execute successfully only if the communication system can buffer at least the words of data specified by count.

CALL MPI_COMM_RANK(comm, rank, ierr)
IF (rank.EQ.0) THEN
    CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)
    CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)
ELSE    ! rank.EQ.1
    CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)
    CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)
END IF

When standard send operations are used, deadlock can occur where both processes are blocked because buffer space is not available. This is also true for synchronous send operations. For buffered sends, if the required amount of buffer space is not available, the program won't complete either, and instead of deadlock, we'll have buffer overflow.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]