Massively parallel genomic sequence search on the Blue Gene/P architecture

Heshan Lin*, Pavan Balaji, Ruth Poole, Carlos Sosa, Xiaosong Ma, Wu Chun Feng

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

34 Citations (Scopus)

Abstract

This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures.We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem - sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes - in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.

Original languageEnglish
Title of host publication2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008 - Austin, TX, United States
Duration: 15 Nov 200821 Nov 2008

Publication series

Name2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008

Conference

Conference2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
Country/TerritoryUnited States
CityAustin, TX
Period15/11/0821/11/08

Fingerprint

Dive into the research topics of 'Massively parallel genomic sequence search on the Blue Gene/P architecture'. Together they form a unique fingerprint.

Cite this