PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns

Zhenhuan Gong, David A. Boyuka, Xiaocheng Zou, Qing Liu, Norbert Podhorszki, Scott Klasky, Xiaosong Ma, Nagiza F. Samatova*

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

16 Citations (Scopus)

Abstract

The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.

Original languageEnglish
Pages343-351
Number of pages9
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013 - Delft, Netherlands
Duration: 13 May 201316 May 2013

Conference

Conference13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013
Country/TerritoryNetherlands
CityDelft
Period13/05/1316/05/13

Fingerprint

Dive into the research topics of 'PARLO: PArallel run-time layout optimization for scientific data explorations with heterogeneous access patterns'. Together they form a unique fingerprint.

Cite this