OpenMP & MPI

What to get

First, get a job:
qsub -I -l nodes=2:ppn=32:xe -l walltime=03:00:00

Open another terminal window. In that window, copy this code into your scratch directory:
cp -r ~bplist/2015/hybrid ~/scratch

Notes

We know that OpenMP is an easy-to-implement API for shared memory processors (SMPs) and MPI is a distributed memory system that uses message passing to communicate across processes. Using them together in a hybrid reduces communication cost within a shared-memory processor and scales an application across multiple processes.

Here are aprun options which are relevant when running MPI with OpenMP:

  1. -n: total number of MPI tasks for the job
  2. -d: depth, the number of OpenMP threads per MPI task (set omp_set_num_threads(x) to same value)
  3. -N: MPI tasks per compute node; this is an optional flag
  4. -S: the number of MPI tasks to allocate per NUMA node. There are 4 NUMA regions/node. Threads running on the same region can improve performance.
Image of Processors with Non-Uniform Memory Access from https://computing.llnl.gov/tutorials/openMP/

The function call omp_set_num_threads(x) is the safest way to set the number of OpenMP threads when doing hybrid.

For instance,
aprun -n 4 -d 16 ./file.o, or
aprun -n 4 -S 1 ./file.o

For a more thorough reference on aprun, see Blue Waters' aprun page

Example

As an example, we have a model of a spread of a rumor.

  1. cd ~/scratch/hybrid
  2. vi rumor-hybrid.c
  3. When ready to compile:
  4. make

Exercises

As an exercise, add MPI to this OpenMP program that calculates pi (if you need to peek at one solution, see pi-key.c):

  1. cd ~/scratch/hybrid
  2. vi pi-openmp.c
  3. When ready to compile:
  4. cc -o pi-openmp.o pi-openmp.c

Another exercise is to add OpenMP to this MPI Sieve of Eratosthenes program (if you need to peek at one solution, see sieve-key.c):

  1. cd ~/scratch/hybrid
  2. vi sieve-mpi.c
  3. When ready to compile:
  4. cc -o sieve-mpi.o sieve-mpi.c
  5. If you're curious about this program, run it! For instance, aprun -n 4 -d 16 ./sieve-mpi.o -n i, where i is the upper bound on the prime search

Resources

  1. hellorank.c is a simple program to understand the different aprun options for OpenMP + MPI hybrid. It is from Blue Waters (a.k.a., Galen...).
    cc -o hellorank.o hellorank.c to compile, aprun -n4 -S1 -d8 ./hellorank.o is one option for running.
  2. The Blue Waters' page explaining how to use OpenMP + MPI in hybrid.