|Position Title||Parallel computational biology - algorithm and software developer|
|Summary||As part of this internship, an undergraduate student will develop a scalable parallel implementation for constructing the suffix tree data structure for DNA and protein sequences.|
|Job Description||This internship project is for the implementation of a scalable, parallel, open-source library for suffix tree construction written in C/C++ and MPI. Two versions of the code will be developed – one for DNA sequences and another for protein sequences. |
Suffix trees are heavily used string data structures well suited for pattern matching and sequence indexing of both genomic and protein databases. However, constructing this data structure in parallel remains a significant algorithm development challenge. The data structure has a wide range of application in sequence analysis ranging from basic pattern matching to more complex transcriptomic clustering and genome assembly. Given the exponential increases in DNA and protein databases, this project has a high potential to benefit from petascale computing.
The student will first survey the literature in this area, which are mostly serial, and then will design a parallel algorithm suited for the distributed memory machine model of parallel computing. Implementation will be done in C/C++ and MPI and will be tested on large-scale supercomputers for performance assessment. As part of the project, the student will also involve interdisciplinary collaboration with other scientists. Additional mentorship will be provided by a PhD student. The student will also engage in scholarly activities by submitting the findings to at least one technical publication. The student will also work with the faculty supervisor in preparing a curriculum module that uses the suffix tree data structure.
|Location||High Performance Computational Biology Laboratory |
School of Electrical Engineering and Computer Science
Washington State University, PO Box 642752
Pullman, WA 99164-2752