TY - JOUR
T1 - DRS
T2 - Auto-Scaling for Real-Time Stream Analytics
AU - Fu, Tom Z.J.
AU - Ding, Jianbing
AU - Ma, Richard T.B.
AU - Winslett, Marianne
AU - Yang, Yin
AU - Zhang, Zhenjie
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12
Y1 - 2017/12
N2 - In a stream data analytics system, input data arrive continuously and trigger the processing and updating of analytics results. We focus on applications with real-time constraints, in which, any data unit must be completely processed within a given time duration. To handle fast data, it is common to place the stream data analytics system on top of a cloud infrastructure. Because stream properties, such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time responses. It is essential, for existing systems or future developments, to possess the ability of scaling resources dynamically according to the instantaneous workload, in order to avoid wasting resources or failing in delivering the correct analytics results on time. Motivated by this, we propose DRS, a dynamic resource scaling framework for cloud-based stream data analytics systems. DRS overcomes three fundamental challenges: 1 how to model the relationship between the provisioned resources and the application performance, 2 where to best place resources, and 3 how to measure the system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits, and joins. Extensive experiments with real data show that DRS is capable of detecting sub-optimal resource allocation and making quick and effective resource adjustment.
AB - In a stream data analytics system, input data arrive continuously and trigger the processing and updating of analytics results. We focus on applications with real-time constraints, in which, any data unit must be completely processed within a given time duration. To handle fast data, it is common to place the stream data analytics system on top of a cloud infrastructure. Because stream properties, such as arrival rates can fluctuate unpredictably, cloud resources must be dynamically provisioned and scheduled accordingly to ensure real-time responses. It is essential, for existing systems or future developments, to possess the ability of scaling resources dynamically according to the instantaneous workload, in order to avoid wasting resources or failing in delivering the correct analytics results on time. Motivated by this, we propose DRS, a dynamic resource scaling framework for cloud-based stream data analytics systems. DRS overcomes three fundamental challenges: 1 how to model the relationship between the provisioned resources and the application performance, 2 where to best place resources, and 3 how to measure the system load with minimal overhead. In particular, DRS includes an accurate performance model based on the theory of Jackson open queueing networks and is capable of handling arbitrary operator topologies, possibly with loops, splits, and joins. Extensive experiments with real data show that DRS is capable of detecting sub-optimal resource allocation and making quick and effective resource adjustment.
KW - Termsa-Cloud computing
KW - queueing network model
KW - resource auto-scaling
KW - stream data analytics
UR - https://www.scopus.com/pages/publications/85029168732
U2 - 10.1109/TNET.2017.2741969
DO - 10.1109/TNET.2017.2741969
M3 - Article
AN - SCOPUS:85029168732
SN - 1063-6692
VL - 25
SP - 3338
EP - 3352
JO - IEEE/ACM Transactions on Networking
JF - IEEE/ACM Transactions on Networking
IS - 6
ER -