a national resource for computational science education

HOME UPEP Shodor Blue Waters

Suffix trees: How to do Google search in bioinformatics?

By Ananth Kalyanaraman
Washington State University, Pullman, WA

This module will:

  • Introduce the suffix tree data structure and its many applications in string matching and bioinformatics
  • Describe how suffix trees are built on a serial computer
  • Discuss the challenges associated with building the tree in parallel
  • Explain one application in bioinformatics (pattern matching) that uses suffix tree
  • Develop a method to implement pattern matching on a distributed memory parallel computer
  • Describe how to analyze parallel performance and identify improvements

Upon completion of this module students should be able to:

  • Identify problems and applications which could benefit from the use of suffix trees
  • Design, implement and evaluate suffix tree-based algorithms in parallel
  • Devise experiments to test the scaling of a parallel code
  • Debug, tune and optimize a parallel code with very large compute cores and distributed memory
  • Identify suitable methods to represent tree data structures in memory and in disks

The documents can be downloaded below.


Module Document (.docx) : The curriculum module document in MS Word (.docx) format.

Module Document (.pdf) : The module document in PDF format.

Suffix Trees Slides : Slides to accompany the Suffix Trees curriculum module.

Suffix Trees Source Code : A zip file containing all the source for the Suffix Trees module.

Algorithm Description (.docx) : The algorithm description document in .docx format.

Algorithm Description (PDF) : The algorithm description document in PDF format.

Module Rubric (.docx) : The module rubric in .docx format.

Module Rubric (PDF) : The module rubric in PDF format.