Results: 1120 of 30 
This module teaches: 1) Conway's Game of Life as an example of a cellular automaton, 2) how cellular automata are used in solutions to scientific problems, 3) how to implement parallel code for Conway's Game of Life (including versions that use shared memory via OpenMP, distributed memory via the Message Passing Interface (MPI), and hybrid via a combination of OpenMP and MPI), 4) how to measure the performance and scaling of a parallel application in multicore and manycore environments, and 5) how cellular automata fall into the Structured Grid "dwarf" (a class of algorithms that have similar communication and computation patterns). Upon completion of this module, students should be able to: 1) Understand the importance of cellular automata in modeling scientific problems, 2) Design a parallel algorithm and implement it using OpenMP and/or MPI, 3) Measure the scalability of a parallel code over multiple or many cores, and 4) Explain the communication and computation patterns of the Structured Grid dwarf. It is assumed that students will have prerequisite experience with C or Fortran 90, *nix systems, and modular arithmetic.
View Metadata

Epidemiology is the study of infectious disease. Infectious diseases are said to be "contagious" among people if they are transmittable from one person to another. Epidemiologists can use models to assist them in predicting the behavior of infectious diseases. This module will develop a simple agentbased infectious disease model, develop a parallel algorithm based on the model, provide a coded implementation for the algorithm, and explore the scaling of the coded implementation on high performance cluster resources.
View Metadata

This module introduces the basics of cellular automaton simulation with an application to studying the effect of fencing artificial watering points on adult cane toad invasion in Australia.
View Metadata

This module provides a quick review of dynamic programming, but the student is assumed to have seen it before. The parallel programming environment is NVIDIA's CUDA environment for graphics cards (GPGPU  general purpose graphics processing units). The CUDA environment simultaneously operates with a fast shared memory and a much slower global memory, and thus has aspects of sharedmemory parallel computing and distributed computing. Specifics for programming in CUDA are included where appropriate, but the reader is also referred to the NVIDIA CUDA C Programming Guide, and the CUDA API Reference Manual.
View Metadata

This module is largely standalone. It is "Part II" only in the sense that it does not contain the overview of dynamic programming seen in Part I, and does not recapitulate the introduction to CUDA. We will continue to refer the reader to various NVIDIA references where appropriate, particularly the NVIDIA CUDA C Programming Guide, and the CUDA API Reference Manual, and where we introduce new CUDAspecific ideas, will linger a bit longer by way of introduction. The algorithms described here are completely independent of Part I, so that a reader who already has some familiarity with CUDA and dynamic programming may begin with this module with little difficulty.
View Metadata

This module teaches an introduction to the Party Problem, a problem in the field of Ramsey Theory, a subfield of mathematics and performance differences of a naive solution to the Party Problem between a sequential program, an OpenMP program, and a CUDA program.
View Metadata

This module teaches matrix multiplication in the context of enumerating paths in a graph and the basics of programming in CUDA. It emphasizes the power of using shared memory when programming on GPGPU architectures.
View Metadata

This module teaches the basic principles of semiclassical transport simulation based on the timedependent Boltzmann transport equation (BTE) formalism with performance considerations for parallel implementations of multidimensional transport simulation and the numerical methods for efficient and accurate solution of the BTE for both electronic and thermal transport using the simple finite difference discretization and the stable upwind method.
View Metadata

This module explores the inner workings of the BLAST similarity search tool, considering the algorithm and the impact of various search conditions and settings on performance. Various approaches to parallelizing the computation and their performance impacts are considered. Benchmarking of the mpiBLAST parallel code is carried out at different scales.
View Metadata

This module presents some of the general ideas behind and basics principles of highperformance computing (HPC) as performed on a supercomputer. These concepts should remain valid even as the technical specification of the latest machines continually change. Although this material is aimed at HPC supercomputers, if history be a guide, present HPC hardware and software become desktop machines in less than a decade.
View Metadata

Results: 1120 of 30 