between these pairs of vertices and has a high BC score. In
contrast, vertex 8 does not belong on a path between any
pair of the remaining vertices in the graph and thus has a BC
score of zero. Note that the scores reflected in Figure 1 treat
a path from vertex u to vertex v as equivalent to a path from
vertex v to vertex u since these paths are undirected. In other
words, to avoid double counting the number of (undirected)
shortest paths we divide the scores by two.
The magnitude of BC values also scales with the size of
the network. For a fair comparison of BC values between
vertices of two different graphs, a commonly used technique is to normalize the BC scores by their largest possible value4: (n − 1)(n − 2). Such a comparison could be useful
for comparing discrete slices of a network that changes
over time. 15
Naïve implementations of Betweenness Centrality solve
the all-pairs shortest-paths problem using the O(n3) Floyd-
Warshall algorithm and augment this result with path
counting. Brandes improved upon this approach with an
algorithm that runs in O(mn) time for unweighted graphs. 3
The key concept of Brandes’s approach is the dependency of
a vertex v with respect to a given source vertex s:
( 2)
The recursive relationship between the dependency of a
vertex and the dependency of its successors allows a more
asymptotically efficient calculation of the centrality metric.
Brandes’s algorithm splits the betweenness centrality calculation into two major steps:
1. Find the number of shortest paths between each pair
of vertices.
2. Sum the dependencies for each vertex.
We can redefine the calculation of BC scores in terms of
dependencies as follows:
( 3)
2. 3. GPU architecture and programming model
The relatively high memory bandwidth of GPUs compared
to that of conventional CPUs has resulted in many high-performance GPU graph algorithms. 15, 17, 19 Compared to
CPUs, GPUs tend to rely on latency hiding rather than caching and leverage a Single-Instruction, Multiple-Thread (SIMT)
programming model. The SIMT model allows for transistors to be allocated to additional processor cores rather than
structures for control flow management.
GPUs are comprised of a series of Streaming Multiprocessors
(SMs), each of which manages hundreds of threads. The
threads within each SM execute in groups of 32 threads (on
current NVIDIA architectures) called warps. Although the
execution paths of the threads within each warp may diverge,
peak performance is attained when all threads within a warp
execute the same instructions. Synchronization between
the warps of a particular SM is inexpensive but properly syn-
chronizing all of the SMs of the GPU requires the launch of a
edges and a large number of vertices have a small number of
outgoing edges. 2 Finally, a small world graph has a diameter
that is proportional to the logarithm of the number of vertices
in the graph. 25 In these networks every vertex can be reached
from every other vertex by traversing a small number of edges.
Representation of sparse graphs in memory. The most
intuitive way to store a graph in memory is as an adjacency
matrix. For unweighted graphs, element Aij of the matrix is
equal to 1 if an edge exists from i to j and is equal to 0 otherwise. The real-world graphs that we examine in this article,
however, are sparse, meaning that a vast majority of the elements are zeros in the adjacency matrix representation of
these data sets. Rather than using O(n2) space to store the
entire adjacency matrix, we use the Compressed Sparse Row
(CSR) format, as shown in Figure 1. This representation
consists of two arrays: row offsets (R) and column indices (C).
The column indices array is a concatenation of each vertex’s
adjacency list into an array of m elements. The row offsets
array in an n + 1 element array that points at where each vertex’s adjacency list beings and ends within the column indices array. For example, the adjacency list of a vertex u starts
at C[R[u] ] and ends at C[R[u+ 1]− 1] (inclusively).
2. 2. Brandes’s algorithm
Betweenness centrality was originally developed in the
social sciences for classifying people who were central to
networks and could thus influence others by withholding
information or altering it. 8 The metric attempts to distin-
guish the most influential vertices in a network by measur-
ing the ratio of shortest paths passing through a particular
vertex to the total number of shortest paths between all pairs
of vertices. Intuitively, this ratio determines how well a ver-
tex connects pairs of other vertices in the network. Formally,
the Betweenness centrality of a vertex v is defined as:
( 1)
where σst is the number of shortest paths between vertices
s and t and σst(v) is the number of those shortest paths that
pass through v.
Consider Figure 1. Vertex 3 is the only vertex that lies on
paths from its left (vertices 4 through 8) to its right (vertices
0 through 2). Hence vertex 3 lies on all of the shortest paths
BC[ 7] = 0
BC[ 6] = 7 BC[ 5] = 6
BC[ 3] = 15
BC[0] = 3
BC[ 8] = 0
R = [0, 3, 5, 8, 12, 16, 20, 24, 27, 28]
C = [ 1, 2, 3 ê0, 2 ê 0, 1, 3 ê 0, 2, 4, 5 ê 3, 5, 6, 7 ê 3, 4, 6, 7 ê 4, 5, 7, 8 ê4, 5, 6 ê 6]
74
3
2
1
0
56
8
BC[ 4] = 6 BC[ 2] = 3
BC[ 1] = 0
Figure 1. Example betweenness centrality scores and CSR
representation for a small graph.