TY - GEN
T1 - Gemini
T2 - 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016
AU - Zhu, Xiaowei
AU - Chen, Wenguang
AU - Zheng, Weimin
AU - Ma, Xiaosong
N1 - Publisher Copyright:
© 2016 by The USENIX Association All Rights Reserved.
PY - 2016
Y1 - 2016
N2 - Traditionally distributed graph processing systems have largely focused on scalability through the optimizations of inter-node communication and load balance. However, they often deliver unsatisfactory overall processing efficiency compared with shared-memory graph computing frameworks. We analyze the behavior of several graph-parallel systems and find that the added overhead for achieving scalability becomes a major limiting factor for efficiency, especially with modern multi-core processors and high-speed interconnection networks. Based on our observations, we present Gemini, a distributed graph processing system that applies multiple optimizations targeting computation performance to build scalability on top of efficiency. Gemini adopts (1) a sparse-dense signal-slot abstraction to extend the hybrid push-pull computation model from shared-memory to distributed scenarios, (2) a chunk-based partitioning scheme enabling low-overhead scaling out designs and locality-preserving vertex accesses, (3) a dual representation scheme to compress accesses to vertex indices, (4) NUMA-aware sub-partitioning for efficient intra-node memory accesses, plus (5) locality-aware chunking and fine-grained work-stealing for improving both inter-node and intra-node load balance, respectively. Our evaluation on an 8-node high-performance cluster (using five widely used graph applications and five real-world graphs) shows that Gemini significantly outperforms all well-known existing distributed graph processing systems, delivering up to 39.8× (from 8.91×) improvement over the fastest among them.
AB - Traditionally distributed graph processing systems have largely focused on scalability through the optimizations of inter-node communication and load balance. However, they often deliver unsatisfactory overall processing efficiency compared with shared-memory graph computing frameworks. We analyze the behavior of several graph-parallel systems and find that the added overhead for achieving scalability becomes a major limiting factor for efficiency, especially with modern multi-core processors and high-speed interconnection networks. Based on our observations, we present Gemini, a distributed graph processing system that applies multiple optimizations targeting computation performance to build scalability on top of efficiency. Gemini adopts (1) a sparse-dense signal-slot abstraction to extend the hybrid push-pull computation model from shared-memory to distributed scenarios, (2) a chunk-based partitioning scheme enabling low-overhead scaling out designs and locality-preserving vertex accesses, (3) a dual representation scheme to compress accesses to vertex indices, (4) NUMA-aware sub-partitioning for efficient intra-node memory accesses, plus (5) locality-aware chunking and fine-grained work-stealing for improving both inter-node and intra-node load balance, respectively. Our evaluation on an 8-node high-performance cluster (using five widely used graph applications and five real-world graphs) shows that Gemini significantly outperforms all well-known existing distributed graph processing systems, delivering up to 39.8× (from 8.91×) improvement over the fastest among them.
UR - https://www.scopus.com/pages/publications/85057565909
M3 - Conference contribution
AN - SCOPUS:85057565909
T3 - Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016
SP - 301
EP - 316
BT - Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016
PB - USENIX Association
Y2 - 2 November 2016 through 4 November 2016
ER -