By Henry Neeman
The OU Supercomputing Center for Education and Research ( OSCER)
This is the second module in a trilogy, which begins with
"HPC on a Single Thread", and concludes with
"Techniques and Technologies". These three modules comprised much of the core material for the
2010 Blue Waters Undergraduate Petascale Institute. It is intended that these materials may be readily adapted and adopted by undergraduate
faculty to serve as the core content for an undergraduate course on scientific parallel
computing. These materials were, in turn, adapted from the
"Supercomputing in Plain English" materials originally developed at OSCER for campus and regional education, outreach and
Links to the module resources follow the content description below.
* Shared Memory Multithreading
This submodule is an introduction to using multiple, independent flows of execution. Topics include: parallelism basics (definition, threads vs. processes, Amdahl's Law, speedup, scalability, granularity, parallel overhead); recap of the jigsaw puzzle analogy from the Overview submodule; the fork/join model; OpenMP (compiler directives, hello world, parallel do/for, chunks, private vs. shared data, static vs. dynamic vs. guided scheduling, synchronization, barriers, critical sections, race conditions explained via the analogy of "The Pen Games," reductions, how to parallelize a serial code).
* Distributed Multiprocessing
By the time the participants reach this submodule, they have a fairly good grasp of how to think about parallelism, but no experience with distributed parallelism or multiprocessing. Topics include: an analogy for understanding distributed parallelism (desert islands), which covers distributed operation, communication, message passing, independence, privacy, latency vs. bandwidth; recap of parallelism issues; parallel strategies (client-server, task parallelism, data parallelism, pipelining); MPI (structure of MPI calls, MPI program structure, Single Program/Multiple Data strategy, hello world, MPI runs, compiling for MPI, rank, determinism vs. indeterminism, MPI data types, tags, communicators, broadcasting, reductions, non-blocking vs. blocking communication, communication hiding).
* Applications and Types of Parallelism
This submodule focuses on various kinds of parallelism, motivated by example application types. Topics include: Monte Carlo simulation to illustrate client-server (the concept of embarrassingly parallel or loosely coupled computing, Monte Carlo methods in layman.s terms, high energy physics as a motivating example, parallelization of Monte Carlo); N-body methods to illustrate task parallelism (N-body problems, 1-, 2- and 3-body problems, big-O notation for non-computer scientists, spatial vs temporal complexity, force calculations, parallelizing force calculations, data parallelism vs task parallelism, reductions, collective communications); transport problems to illustrate data parallelism (Riemann sums, mesh discretizations, finite difference method, Navier-Stokes equation, ghost boundary zones, data decomposition, Cartesian geometries, use of send/receive buffers).
* Multicore Madness
The purpose of this submodule is to frighten the participants, because multicore (and, soon, many-core) are highly disruptive technologies that will require substantial redesign of many existing software applications and will make more difficult the design of new software. Topics include: implications of Moore's Law; recap of the storage hierarchy, including a practical example of the disparity between CPU speed and RAM bandwidth; recap of tiling; multicore/many-core basics (definition, RAM challenges, interconnect challenges); weather forecasting example (Cartesian mesh, finite difference, ghost boundaries); software strategies for weather forecasting (tiling won.t work because of inadequate calculations per byte, strategies for improving cache reuse, multiple subdomains per process, expanded ghost stencil to improve both cache reuse and communication hiding, higher order numerical schemes to increase the number of calculations per mesh zone per timestep, parallelization in Z to improve the size of each subdomain, cache size limitations).
Presentation #5: Shared Memory Multithreading : Presentation in MS PowerPoint (.ppt) format.
Exercise #5: OpenMP : Exercise in MS Word (.doc) format.
Presentation #6: Distributed Memory Parallelism : Presentation in MS PowerPoint (.ppt) format.
Exercise #6: MPI Point to Point : Exercise in MS Word (.doc) format.
Presentation #7: Applications and Types of Parallelism : Presentation in MS PowerPoint (.ppt) format.
Exercise #7: MPI Collective Communications : Exercise in MS Word (.doc) format.
Presentation #8: Multicore Madness : Presentation in MS PowerPoint (.ppt) format.
Exercise #8: Hybrid MPI+OpenMP : Exercise in MS Word (.doc) format.
Source Code: OpenMP : Zip archive containing source codes for Exercise 05: OpenMP.
Source Code: MPI Point to Point : Zip archive containing source codes for Exercise 06: MPI Point to Point.
N Body :
Parallelization: Conway's Game of Life : Simulates the evolution of simple and complex forms of lives based on simple rules.
Parallelization: Area Under a Curve : Calculus integration program that find the area under a curve. Perfect to teach the basics of OpenMP and MPI.
Introduction to OpenMP : Guided lesson to teach undergraduate and graduate students how to use OpenMP.