TARDIS: Distributed indexing framework for big time series data

Liang Zhang, Noura Alghamdi, Mohamed Y. Eltabakh, Elke A. Rundensteiner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

22 Citations (Scopus)

Abstract

The massive amounts of time series data continuously generated and collected by applications warrant the need for large scale distributed time series processing systems. Indexing plays a critical role in speeding up time series similarity queries on which various analytics and applications rely. However, the state-of-the-art indexing techniques, which are iSAX-based structures, do not scale well due to the small adopted fan-out (binary) that leads to a highly deep index tree, and the expensive search cost through many internal nodes. More seriously, the iSAX character-level cardinality adopted by these indices suffers from a poor maintenance of the proximity relationships among the time series objects, which leads to severe accuracy degradation for approximate similarity queries. In this paper, we propose the TARDIS distributed indexing framework to overcome the aforementioned limitations. TARDIS introduces a novel iSAX index tree that is based on a new word-level variable cardinality. The proposed index ensures compact structure, efficient search and comparison, and good preservation of the similarity relationships. TARDIS is suitable for indexing and querying billion-scale time series datasets. TARDIS is composed of one centralized global index and local distributed indices-one per each data partition across the cluster. TARDIS uses both the global and local indices to efficiently support exact match and kNN approximate queries. The system is implemented using Apache Spark, and extensive experiments are conducted on benchmark and real-world datasets. Evaluation results demonstrate that for over one billion time series dataset (TB scale), the construction of a clustered index is about 83% faster than the existing techniques. Moreover, the average response time of exact match queries is decreased by 50%, and the accuracy of the kNN approximate queries has increased more than 10 fold (from 3% to 40%) compared to the existing techniques.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
PublisherIEEE Computer Society
Pages1202-1213
Number of pages12
ISBN (Electronic)9781538674741
DOIs
Publication statusPublished - Apr 2019
Externally publishedYes
Event35th IEEE International Conference on Data Engineering, ICDE 2019 - Macau, China
Duration: 8 Apr 201911 Apr 2019

Publication series

NameProceedings - International Conference on Data Engineering
Volume2019-April
ISSN (Print)1084-4627

Conference

Conference35th IEEE International Conference on Data Engineering, ICDE 2019
Country/TerritoryChina
CityMacau
Period8/04/1911/04/19

Keywords

  • Exact matching query
  • Indexing
  • Isax-t
  • Knn approximate query
  • Sigtree
  • Tardis
  • Time series
  • Whole similarity matching
  • Word-level cardinality

Fingerprint

Dive into the research topics of 'TARDIS: Distributed indexing framework for big time series data'. Together they form a unique fingerprint.

Cite this