Coordinating computation and I/O in massively parallel sequence search

  • Heshan Lin*
  • , Xiaosong Ma
  • , Wuchun Feng
  • , Nagiza F. Samatova
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

39 Citations (Scopus)

Abstract

With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.

Original languageEnglish
Article number5473216
Pages (from-to)529-543
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume22
Issue number4
DOIs
Publication statusPublished - 2011
Externally publishedYes

Keywords

  • BLAST
  • Scheduling
  • bioinformatics
  • parallel I/O
  • parallel genomic sequence search

Fingerprint

Dive into the research topics of 'Coordinating computation and I/O in massively parallel sequence search'. Together they form a unique fingerprint.

Cite this