PARROT: pattern-based correlation exploitation in big partitioned data series

  • Liang Zhang*
  • , Noura Alghamdi
  • , Huayi Zhang
  • , Mohamed Y. Eltabakh
  • , Elke A. Rundensteiner
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Data series approximate similarity search is a basic building block operation essential for almost all analytical tasks. To speed up this important operation, the prevalent approach is to construct indexes directly on the data series objects. This suffers from very high construction time and storage cost due to the inherent complexity of indexing these high-dimensional data series objects. We instead design a promising new approach that leverages the unique property of correlations between the high-dimensional data series objects and the (often simple) partitioning attribute(s) in distributed data series repositories. Our proposed infrastructure, called PARROT, discovers, assesses, and exploits such correlations for similarity query optimization. PARROT addresses several critical challenges including the high dimensionality of the data series objects, softness (uncertainty) of correlation, correlation granularity, and lack of a proper measure for assessing correlation strength in big data series. We present scalable solutions tackling each of these challenges including pattern-level indexing, exception handling strategies for soft correlations, and a new entropy-based measure for assessing the correlation strength and judging their potential effectiveness. The PARROT query engine efficiently supports approximate kNN similarity queries leveraging the PARROT index. PARROT prototype is implemented on Apache Spark. Extensive experiments on real and synthetic datasets demonstrate that PARROT has substantially lower index construction costs, smaller storage overhead, and better performance and accuracy for processing similarity queries compared to alternate state-of-the-art solutions.

Original languageEnglish
Pages (from-to)665-688
Number of pages24
JournalVLDB Journal
Volume32
Issue number3
DOIs
Publication statusPublished - May 2023
Externally publishedYes

Keywords

  • Approximate similarity queries
  • Big data series
  • Correlation-aware indexing

Fingerprint

Dive into the research topics of 'PARROT: pattern-based correlation exploitation in big partitioned data series'. Together they form a unique fingerprint.

Cite this