The IBM Parallel Environment for AIX program product (PE) is an environment designed for the development and execution of parallel Fortran, C, or C++ programs. PE consists of components and tools for developing, executing, debugging, profiling, and tuning parallel programs.
The PE is a distributed memory message passing system. It runs on the RS/6000 platform using the AIX operating system (Version 4.2.1). Specifically, you can use the PE to execute parallel programs on:
The RS/6000 processors of your system are called processor nodes. A parallel program executes as a number of individual, but related, parallel tasks on a number of your system's processor nodes. The group of parallel tasks is called a partition. The processor nodes are connected on the same LAN, so the parallel tasks of your partition can communicate to exchange data or synchronize execution. If you are using an SP system:
PE supports the two basic parallel programming models - SPMD and MPMD. In the SPMD (Single Program Multiple Data) model, the programs running the parallel tasks of your partition are identical. The tasks, however, work on different sets of data. In the MPMD (Multiple Program Multiple Data) model, each node may be running a different program. A typical example of this is the master/worker MPMD program. In a master/worker program, one task - the master - coordinates the execution of all the others - the workers.
|Note:||While the remainder of this introduction describes each of the PE components and tools in relation to a specific phase of an application's life cycle, this does not imply that they are limited to one phase. They are ordered this way for descriptive purposes only; you will find many of the tools useful across an application's entire life cycle.|
The application developer begins by creating a parallel program's source code. The application developer might create this program from scratch or could modify an existing serial program. In either case, the developer places calls to Message Passing Interface (MPI) or Low-level Application Programming Interface (LAPI) routines so that it can run as a number of parallel tasks. This is known as parallelizing the application. The MPI is similar to the Message Passing Library (MPL) from an earlier version of Parallel Environment. MPI provides message passing capabilities for the current version of PE. There are two libraries for MPI:
All tasks of a program must use either signal handling or threaded calls but not a combination of each.
MPL programs are still supported for non-threaded applications.
|Note:||Throughout this book, when referring to anything non-specific for MPI and
MPL, the term message passing will be used.
message passing program message passing routine message passing call
The message passing calls enable the parallel tasks of your partition to communicate data and coordinate their execution. The message passing routines in turn call communication subsystem library routines which handle communication among the processor nodes. There are two separate implementations of the communication subsystem library - the Internet Protocol (IP) Communication Subsystem and the User Space (US) Communication Subsystem. While the message passing application interface remains the same, the communication subsystem libraries use different protocols for communication among processor nodes. The IP communication subsystem uses Internet Protocol, while the US communication subsystem is designed for the SP system's high performance switch feature. The communication subsystem library implementations are dynamically linked when you invoke the program. For more information on the message passing subroutine calls, refer to IBM Parallel Environment for AIX: MPI Programming and Subroutine Reference IBM Parallel Environment for AIX: MPL Programming and Subroutine Reference, and IBM Parallel Environment for AIX: Hitchhiker's Guide
In addition to message passing communication, the Parallel Environment supports a separate communication protocol known as the Low-level Application Programming Interface (LAPI). LAPI differs from MPI in that it is based on an "active message style" mechanism that provides a one-sided communications model. That is, one process initiates an operation and the completion of that operation does not require any other process to take a complimentary action.
LAPI only runs with the US Communication Subsystem. For this reason, it is designed to run on the SP system's high performance communication adapter only. The RS/6000 workstation cluster does not support LAPI.
Although LAPI is used for data communication in conjunction with PE, it is actually part of the communication subsystem for IBM's Parallel System Support Programs (PSSP). For more information on LAPI, see IBM Parallel System Support Programs for AIX: Administration Guide, and IBM Parallel System Support Programs for AIX: Command and Technical Reference
After writing the parallel program, the application developer then begins a cycle of modification and testing. The application developer now compiles and runs his program from his home node using the Parallel Operating Environment (POE). The home node is any workstation on the LAN. POE is an execution environment designed to hide, or at least smooth, the differences between serial and parallel execution.
To assist with node allocation for job management, the role of IBM LoadLeveler has been expanded to work with POE for interactive jobs. LoadLeveler will now provide resource management function both on and off the SP system. You can run parallel programs on a cluster of processors running LoadLeveler, or on a mixed system of LoadLeveler processors that supplement an SP system. LoadLeveler not only provides SP node allocation for jobs using the US communication subsystem, but also provides management for non-SP nodes, or for SP nodes being used for jobs other than user space. LoadLeveler will still be used by POE for batch jobs as well. See the IBM LoadLeveler documentation for more information on this job management system.
In general, with POE, you invoke a parallel program from your home node and run its parallel tasks on a number of remote nodes. When you invoke a program on your home node, POE starts your Partition Manager which allocates the nodes of your partition and initializes the local environment. Depending on your hardware and configuration, the Partition Manager uses a host list file, LoadLeverler, or the SP system Resource Manager to allocate nodes. A host list file contains an explicit list of node requests, while LoadLeveler or the Resource Manager allocate nodes from one or more system pools implicitly based on their availability. On an SP system using the Resource Manager, you can also use a host list file to determine how an allocated node's resources - its SP switch adapter and CPU - are used. Your program task can either:
With regard to the expanded LoadLeveler function, POE now provides an option to enable you to specify whether your program will use MPI, LAPI, or both. Using this option, POE ensures that each API initializes properly and informs LoadLeveler which APIs are used so each node is set up completely.
For Single Program Multiple Data (SPMD) applications the Partition Manager executes the same program on all nodes. For Multiple Program Multiple Data (MPMD) applications, the Partition Manager prompts you for the name of the program to load on each node. The Partition Manager also connects standard I/O to each remote node so the parallel tasks can communicate with the home node. Although you are running tasks on remote nodes, POE allows you to continue using the standard UNIX** and AIX execution techniques with which you are already familiar. For example, you can redirect input and output, pipe the output of programs, or use shell tools. The POE includes:
The following tools are discussed in IBM Parallel Environment for AIX: Operation and Use, Volume 2, Tools Reference and allow you to debug, visualize, and tune parallel programs.
There are two parallel debugging facilities. The first - pdbx - is a line-oriented debugger based on the dbx debugger. The other - pedb - is a Motif**-based debugger.
Once the parallel program is debugged, you now want to tune the program for optimal performance. To do this, you turn to the PE parallel profiling capability and Visualization Tool to analyze the program.
The parallel profiling capability enables you to use the PE Xprofiler graphical user interface, as well as the AIX commands prof and gprof on parallel programs. Xprofiler is a tool that helps you analyze your parallel application's performance quickly and easily. It uses procedure profiling information to construct a graphical display of the functions within your application. Xprofiler provides quick access to the profiled data, which lets you identify the functions that are the most CPU-intensive. The graphical user interface also lets you manipulate the display in order to focus on the application's critical areas.
The Visualization Tool (VT) contains a set of displays which allow you to visualize performance characteristics of your program and system. Each display presents specific, often complex, information in an easily-interpretable form such as a bar chart or a strip graph. You can use VT's displays for trace visualization and online performance monitoring.
|Note:||Once the parallel program is tuned to your satisfaction, you might prefer to execute it using a job management system such as IBM LoadLeveler*. If you do use a job management system, consult its documentation for information on its use.|
With PE 2.4, POE supports user programs developed with AIX 4.3. It also supports programs developed with AIX 4.2, intended for execution on AIX 4.3.
This release of PE provides a mechanism for temporarily saving the state of a parallel program at a specific point (checkpointing), and then later restarting it from the saved state. When a program is checkpointed, the checkpointing function captures the state of the application as well as all data, and saves it in a file. When the program is restarted, the restart function retrieves the application information from the file it saved, and the program then starts running again from the place at which it was saved.
In earlier releases of PE, POE relied on the SP Resource Manager for performing job management functions. These functions included keeping track of which nodes were available or allocated and loading the switch tables for programs performing User Space communications. LoadLeveler, which had only been used for batch job submissions in the past, is now replacing the Resource Manager as the job management system for PE. One notable effect of this change is that LoadLeveler now allows you to run up to four User Space tasks per node.
With PE 2.4, the MPI library now includes support for a subset of MPI I/O, described by Chapter 9 of the MPI-2 document; MPI-2: Extensions to the Message-Passing Interface, Version 2.0. MPI-I/O provides a common programming interface, improving the portability of code that involves parallel I/O.
With regard to MPI/LAPI jobs, this release of PE supports a maximum of 2048 tasks for IP, and 1024 tasks for US, as opposed to the previous release, which supported a maximum of 512 tasks.
In this release, POE is adding support for the following compilers:
The pedb debugger now includes a message queue facility. Part of the pedb debugger interface, the message queue viewing feature can help you debug Message Passing Interface (MPI) applications by showing internal message request queue information. With this feature, you can view:
This release includes a variety of enhancements to Xprofiler, including:
This section is intended for customers migrating from earlier releases of PE to PE 2.4. It contains specific information on some differences between earlier releases that you need to consider prior to installing or using PE 2.4. To find out which release of PE you currently have installed, use lslpp.
PE 2.4 commands and applications are compatible with AIX Version 4.3.2 or later only, not with earlier versions of AIX.
Applications from previous versions of Parallel Environment are binary compatible with PE 2.4, with the following exceptions:
Host list files from previous releases that contained multiple pool or usage specifications will be affected as follows when using LoadLeveler:
LAPI programs must set the MP_MSGAPI environment variable.
All tasks within a partition or cluster must be running the same version of PE. You cannot mix versions of PE.
Therefore, for all processors within a workstation cluster, the same release level of the PE software is required.
When you use partitioning, you may have different levels of PE software installed on different partitions; however, within a partition, all the nodes must be at the same level of PE software.
|Note:||See IBM Parallel Environment for AIX: Installation for more information about software compatibility within a workstation cluster or partition, and for administrative and usage information about running different versions of POE in a partitioned environment.|
Users who previously set LIBPATH to include /usr/lib should no longer do so. Setting LIBPATH to include /usr/lib would cause the POE application not to include all of the POE libraries at execution time.
/usr/lib is included in the loader section search path of all POE applications at compile time, so there is no need to include it in LIBPATH.
The -ip and -us flags for PE Version 1 mpcc, mpCC, and mpxlf compiler scripts are no longer used or supported. All application programs are dynamically linked using these scripts.
Instructions are provided on how to create statically executable versions of your applications in IBM Parallel Environment for AIX: Operation and Use, Volume 1, Using the Parallel Operating Environment. User-written scripts that utilize these options need to be rewritten.
VT trace files generated using Version 1 or Version 2, Release 2 will not be compatible with Version 2.4, and vice versa. Trace files must be regenerated.
However, refer to IBM Parallel Environment for AIX: Operation and Use, Volume 2, Tools Reference for information about the VT trace file format, if you want to write your own conversion program.
Previous versions of POE allowed jobs using the SP Resource Manager to be submitted from a non-SP node by setting the SP_NAME environment variable. For POE Version 2 Release 2 or later, you must also install the ssp.clients fileset. Refer to IBM Parallel Environment for AIX: Installation for more information.
PE 2.4 supports IEEE POSIX 1003.1-1996 of POSIX threads (sometimes known as Draft 10), that is xpg5 compliant as a default when compiling parallel applications. Existing applications from previous releases of PE were built with an earlier version of POSIX threads (Draft 7).
Existing threaded applications are supported in binary compatibility mode, without needing to recompile. However, these will run with the older objects from the previous version's threads library.
All new applications are compiled with the new draft of POSIX threads as the default. However, the POE threaded compiler scripts (mpcc_r, mpCC_r, mpxlf_r, mpxlf90_r) also provide an optional flag (-d7) to allow applications to be compiled with the older version of the threads library. See the appropriate compiler command description for further details.
POE compiles and runs all applications as 32 bit applications. 64 bit applications are not yet supported.