TY - GEN
T1 - TARDIS
T2 - 35th IEEE International Conference on Data Engineering, ICDE 2019
AU - Zhang, Liang
AU - Alghamdi, Noura
AU - Eltabakh, Mohamed Y.
AU - Rundensteiner, Elke A.
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - The massive amounts of time series data continuously generated and collected by applications warrant the need for large scale distributed time series processing systems. Indexing plays a critical role in speeding up time series similarity queries on which various analytics and applications rely. However, the state-of-the-art indexing techniques, which are iSAX-based structures, do not scale well due to the small adopted fan-out (binary) that leads to a highly deep index tree, and the expensive search cost through many internal nodes. More seriously, the iSAX character-level cardinality adopted by these indices suffers from a poor maintenance of the proximity relationships among the time series objects, which leads to severe accuracy degradation for approximate similarity queries. In this paper, we propose the TARDIS distributed indexing framework to overcome the aforementioned limitations. TARDIS introduces a novel iSAX index tree that is based on a new word-level variable cardinality. The proposed index ensures compact structure, efficient search and comparison, and good preservation of the similarity relationships. TARDIS is suitable for indexing and querying billion-scale time series datasets. TARDIS is composed of one centralized global index and local distributed indices-one per each data partition across the cluster. TARDIS uses both the global and local indices to efficiently support exact match and kNN approximate queries. The system is implemented using Apache Spark, and extensive experiments are conducted on benchmark and real-world datasets. Evaluation results demonstrate that for over one billion time series dataset (TB scale), the construction of a clustered index is about 83% faster than the existing techniques. Moreover, the average response time of exact match queries is decreased by 50%, and the accuracy of the kNN approximate queries has increased more than 10 fold (from 3% to 40%) compared to the existing techniques.
AB - The massive amounts of time series data continuously generated and collected by applications warrant the need for large scale distributed time series processing systems. Indexing plays a critical role in speeding up time series similarity queries on which various analytics and applications rely. However, the state-of-the-art indexing techniques, which are iSAX-based structures, do not scale well due to the small adopted fan-out (binary) that leads to a highly deep index tree, and the expensive search cost through many internal nodes. More seriously, the iSAX character-level cardinality adopted by these indices suffers from a poor maintenance of the proximity relationships among the time series objects, which leads to severe accuracy degradation for approximate similarity queries. In this paper, we propose the TARDIS distributed indexing framework to overcome the aforementioned limitations. TARDIS introduces a novel iSAX index tree that is based on a new word-level variable cardinality. The proposed index ensures compact structure, efficient search and comparison, and good preservation of the similarity relationships. TARDIS is suitable for indexing and querying billion-scale time series datasets. TARDIS is composed of one centralized global index and local distributed indices-one per each data partition across the cluster. TARDIS uses both the global and local indices to efficiently support exact match and kNN approximate queries. The system is implemented using Apache Spark, and extensive experiments are conducted on benchmark and real-world datasets. Evaluation results demonstrate that for over one billion time series dataset (TB scale), the construction of a clustered index is about 83% faster than the existing techniques. Moreover, the average response time of exact match queries is decreased by 50%, and the accuracy of the kNN approximate queries has increased more than 10 fold (from 3% to 40%) compared to the existing techniques.
KW - Exact matching query
KW - Indexing
KW - Isax-t
KW - Knn approximate query
KW - Sigtree
KW - Tardis
KW - Time series
KW - Whole similarity matching
KW - Word-level cardinality
UR - https://www.scopus.com/pages/publications/85067921541
U2 - 10.1109/ICDE.2019.00110
DO - 10.1109/ICDE.2019.00110
M3 - Conference contribution
AN - SCOPUS:85067921541
T3 - Proceedings - International Conference on Data Engineering
SP - 1202
EP - 1213
BT - Proceedings - 2019 IEEE 35th International Conference on Data Engineering, ICDE 2019
PB - IEEE Computer Society
Y2 - 8 April 2019 through 11 April 2019
ER -