Doi: 10.1145/1400181.1400204
Geometry, Flows, and
Graph-Partitioning Algorithms
By Sanjeev Arora, Satish Rao, and Umesh Vazirani
1. intRoDuction
“Graph partitioning” refers to a family of computational
problems in which the vertices of a graph have to be partitioned into two (or more) large pieces while minimizing
the number of the edges that cross the cut (see Figure 1).
The ability to do so is a useful primitive in “divide and conquer” algorithms for a variety of tasks such as laying out
very large circuits on laying out very large circuits on silicon
chips and distributing computation among processors. Increasingly, it is also used in applications of clustering ranging from computer vision, to data analysis, to learning.
These include finding groups of similar objects (
customers, products, cells, words, and documents) in large data
sets, and image segmentation, which is the first step in image analysis. Unfortunately, most graph-partitioning problems are NP-hard, which implies that we should not expect
efficient algorithms that find optimal solutions. Therefore
researchers have resorted to heuristic approaches, which
have been implemented in several popular freeware codes
and commercial packages.
The goal of this paper is to survey an interesting combination of techniques that have recently led to progress
on this problem. The original motivation for this work was
theoretical, to design algorithms with the best provable
approximation guarantees.* Surprisingly, these ideas have
led also to a new framework for designing very fast and
_
figure 1: a graph and a partition into two subsets S, S . in this
case, the two subsets have equal number of vertices; such a
partition is called a bisection. the number of edges crossing the
cut is 7. if the number of vertices on the two sides is within
a constant factor of each other (say, factor 2), then we call
the partition balanced. Balanced partitions are useful in many
applications.
S
S
practically viable algorithms for this problem. On the theoretical end, the ideas have also led to a breakthrough in a
long-standing open question on metric space embeddings
from the field of function analysis, and new algorithms for
semidefinite programming. These are all surveyed in this
paper.
We will actually describe two disparate-seeming approaches to graph partitioning which turn out, surprisingly, to be related. The first approach is geometric, and holds
the key to actually analyzing the quality of the cut found by
the algorithm. The second approach involves routing flows
in the graph, which we will illustrate using traffic flows in
a road network. This approach holds the key to designing
algorithms that run fast. The relationship between these
two approaches derives from the fact that they are (
roughly) dual views of each other, thus resulting in algorithms
which are both fast and also produce high-quality cuts.
Below, we first sketch the two approaches, and give more
details in Sections 2 through 4.
1. 1. sketch of the geometric approach
Let us start by describing the geometric approach. We
draw the graph in some geometric space (such as the unit
disk in two-dimensional Euclidean space ℜ2), such that the
average length of an edge is short—i.e., the distance between its endpoints is small—while the points are spread
out. More precisely, we require that the distance between
the average pair of vertices is a fixed constant, say 1, while
the distance between the average adjacent pair of vertices is
as small as possible. We will refer to this as an embedding of
the graph in the geometric space.
The motivation behind the embedding is that proximity in the geometric space roughly reflects connectivity in
the graph, and a good partition of the graph should correspond to separating a large area by cutting along a geometric curve. Indeed, given the properties of the embedding,
even a “random partition” of the space by a simple line or
curve should work well! The typical edge is unlikely to be
cut by a random partition since it is short while a typical
pair of vertices is likely to be separated since the distance
between them is large. This means that the expected number of vertices on each side of the cut is large while the expected number of edges crossing the cut is small.†
The actual space that our algorithm will “draw” the
graph in is not two-dimensional Euclidean space ℜ2, but
* We say that the approximation guarantee of the algorithm is C if given any
graph in which the best cut has k edges, the algorithm is guaranteed to find
a cut which has no more than C . k edges. Sometimes, we also say the algorithm is a C-approximation.