Parallel Programming Examples Using MPI

...or...

So you want to use a cluster!

The purpose of parallel programming is to take one large task and to break it up into a number of smaller tasks which can be performed independently. The problem is most complex tasks cannot be easily divided into tasks which are completely independent of each other.

Consider, for example, a meeting of scientists discussing plans to colonize the Moon. You might have one committee discussing habitat construction, another committee discussing hydroponics, a third committee discussing atmospheric recycling, and a fourth committee discussing transport to the moon. The transport committee needs to know how many people and what equipment to transfer. The atmospheric recycling committee probably needs to coordinate with the hydroponics committee. The habitat committee will need to know what resources can be reused and what must be continuously brought from Earth. The transport committee also would likely want to know that information.

Real problems are not just the sum of many small problems, and when you have to break up a large problem into small problems, you need to be able to communicate information between small problems.

The degree to which communication and coordination is required in a parallel program is known as the granularity of the program. Course grain parallelism requires little communication, and fine grain parallelism requires a great deal of communication.

The efficiency of a parallel code is the degree to which you can use all of the processing power available. Codes in which clients spend most of their time waiting to perform a calculation may be better done on a single CPU. Codes with very high network requirements may run better on a single CPU. If t(p) is a measure of the time required to solve a given problem on p processors, the efficiency of a parallel code is given by

e=t(p)/(p*t(1))

We are going to start with three examples which span the range of granularity and incorporate three common numerical techniques. These examples were designed for the MSI HPC Workshops to be run on the "cluster in a suitcase," a small test cluster made up of 1 server and 4 client nodes, all of which are Toshiba laptops running Linux 7.2 and OSCAR. Each client has X enabled so that a "tiled" display can be set up by arranging the laptops next to each other.

The first program is an example of a parameter space study of a Monte Carlo simulation of the game of darts. The model calculates average score for a player whose throw falls within a Gaussian pattern around some aim point. Since the dart board is two dimensional, the grid of aim points makes up a two dimensional parameter space to be studied. Each point in parameter space can be solved completely independently. The parameter space is evenly divided between processors, and each processor performs their own visualization.

The second example is cellular automata arranged in a grid, in this case the Game of Life. You have a two dimensional grid of cells, where each cell is either "alive" or "dead." Cells die out if they are too lonely (less than 2 neighbors) or too crowded (more than three neighbors). Cells are born if they are empty and have exactly three neighbors. The grid can be broken up into sub-grids, but those sub-grids must be able to communicate with each other at the boundaries. Each sub-grid does its own visualization in this example.

The third example is a finite difference integration of N-bodies gravitationally attracted to one another. In studies of cosmology and galactic dynamics, you are dealing with large numbers of objects and every object feels a force from every other object, leading to (n)*(n-1) forces in three dimensions. For large n, the time spent calculating those forces vastly outweighs the time spent doing anything else in the calculation, and efforts at parallelizing the code center only around the force calculation. With p processors, each calculator can calculate n/p forces, but needs to know all n positions. Also, since this is only one piece of the larger code, it is generally arranged so that one code is the coordinator, or server. The server does the standard calculation, and when it comes time to calculate forces, sends out a message to each client code with the data on each n positions. The client codes then calculate n/p forces, and send the information back to the server. The server performs all visualization in this example.

Note: The makefiles in the following examples may need to be changed in your system. Typically, you will want to specify the X11 libraries and possibly the math libraries (-lX11 -lm). You will need to give the include directory location and the library directory location for any include files or libraries not in your computers default path (for example, -I/usr/X11R6/include -L/usr/X11R6/lib -lX11 to include the X11 libraries located at /usr/X11R6).

Note: Currently, the Life and Parameter Space examples are set up to use the display on the client nodes. If your clients are not set up to display X, you can either run the examples with no visualization (not really helpful for seeing things, but you can do some time trials using the unix "time" command), or you can display all of the windows on the server machine by running "mpirun -x DISPLAY program_name" where program_name is the program you are running. This, however, will slow things down considerably and kind of defeat the point of having a cluster. I'm working on adding a few more display options. I'll try to get those built-in as soon as possible!

All files are stored as tarred, gzipped directories. Download the files, and extract them with "tar -xzf filename.tgz". (Be sure to substitute the actual filename for "filename.tgz.") "cd" to the directory, edit the Makefile if needed, and "make" the program for your machine. Run the program using mpirun.

Currently for all programs the command line options to the programs are optional, but you need each prior option to use a given option. In other words, for Life the command line options are size and display. If you want to specify to not use the display, you have to give a value for the size. However, you can just give the size if you want the program to decide by default whether or not to use the display.

CSERD

Parallel Programming Examples using MPI

Parallel Programming Examples Using MPI

So you want to use a cluster!