Accelerating GPU
Betweenness Centrality
By Adam McLaughlina and David A. Bader
DOI: 10.1145/3230485
Abstract
Graphs that model social networks, numerical simulations,
and the structure of the Internet are enormous and cannot
be manually inspected. A popular metric used to analyze
these networks is Betweenness Centrality (BC), which has
applications in community detection, power grid contingency analysis, and the study of the human brain. However,
these analyses come with a high computational cost that
prevents the examination of large graphs of interest.
Recently, the use of Graphics Processing Units (GPUs)
has been promising for efficient processing of unstructured data sets. Prior GPU implementations of BC suffer
from large local data structures and inefficient graph traversals that limit scalability and performance. Here we
present a hybrid GPU implementation that provides good
performance on graphs of arbitrary structure rather than
just scale-free graphs as was done previously. Our methods
achieve up to 13× speedup on high-diameter graphs and an
average of 2. 71× speedup overall compared to the best existing GPU algorithm. We also observe near linear speedup
when running BC on 192 GPUs.
1. INTRODUCTION
Network analysis is a fundamental tool for domains as
diverse as compilers, 17 social networks, 14 and computational
biology. 5 Real world applications of these analyses involve
tremendously large networks that cannot be inspected manually. An example of a graph analytic that has found significant
attention in recent literature is BC. Betweenness centrality
has been used for finding the best location of stores within
cities, 20 studying the spread of AIDS in sexual networks, 13
power grid contingency analysis, 11 and community detection. 23 The variety of fields and applications in which this
method of analysis has been employed shows that graph
analytics require algorithmic techniques that make them
performance portable to as many network structures as
possible. Unfortunately, the fastest known algorithm for
calculating BC scores has O(mn) complexity for unweighted
graphs with n vertices and m edges, making the analysis of
large graphs challenging. Hence there is a need for robust,
high-performance graph analytics that can be applied to a
variety of network structures and sizes.
Graphics Processing Units (GPUs) provide excellent per-
formance for regular, dense, and computationally demand-
ing subroutines such as matrix multiplication. However,
there has been recent success in accelerating irregular,
memory-bound graph algorithms on GPUs as well. 6, 17, 19
Prior implementations of betweenness centrality on the
The original version of this paper is entitled “Scalable
and High Performance Betweenness Centrality on the
GPU” and was published in the Proceedings of the 26th
ACM/IEEE International Conference of High Performance
Computing, Networking, Storage, and Analysis (SC ‘ 14),
572–583.
GPU have outperformed their CPU counterparts, particularly on scale-free networks; however, they are limited in
scalability to larger graph instances, use asymptotically inefficient algorithms that mitigate performance on high diameter graphs, and aren’t general enough to be applied to the
variety of domains that can leverage their results.
This article alleviates these problems by making the fol-
lowing contributions:
• We provide a work-efficient algorithm for betweenness
centrality on the GPU that works especially well for net-
works with a large diameter.
• For generality, we propose an algorithm that chooses
between leveraging either the memory bandwidth
of the GPU or the asymptotic efficiency of the work
being done based on the structure of the graph being
processed. We present an online approach that uses
a small amount of initial work from the algorithm to
suggest which method of parallelism would be best for
processing the remaining work.
• We implement our approach on a single GPU system,
showing an average speedup of 2. 71× across a variety of
both real-world and synthetic graphs over the best previous GPU implementation. Additionally, our implementation attains near linear speedup on a cluster of 192 GPUs.
2. BACKGROUND
2. 1. Definitions
Let a graph G = (V, E) consist of a set V of n = |V| vertices and
a set E of m = |E| edges. A path from a vertex u to a vertex v is
any sequence of edges originating from u and terminating at
v. Such a path is a shortest path if its sequence contains a minimal number of edges. A Breadth-First Search (BFS) explores
vertices of a graph by starting a “source” (or “root”) vertex
and exploring its neighbors. The neighbors of these vertices
are then explored and this process repeats until there are
no remaining vertices to be explored. Each set of inspected
neighbors is referred to as a vertex frontier and the set of outgoing edges from a vertex frontier is referred to as an
edge-frontier. The diameter of a graph is the length of the longest
shortest path between any pair of vertices. A scale-free graph
has a degree distribution that follows a power law, where a
small number of vertices have a large number of outgoing
aD.E. Shaw Research, New York, N Y, USA.