Doi: 10.1145/1400181.1400204
By Sanjeev Arora, Satish Rao, and Umesh Vazirani
“Graph partitioning” refers to a family of computational problems in which the vertices of a graph have to be partitioned into two (or more) large pieces while minimizing the number of the edges that cross the cut (see Figure 1). The ability to do so is a useful primitive in “divide and conquer” algorithms for a variety of tasks such as laying out very large circuits on laying out very large circuits on silicon chips and distributing computation among processors. Increasingly, it is also used in applications of clustering ranging from computer vision, to data analysis, to learning. These include finding groups of similar objects ( customers, products, cells, words, and documents) in large data sets, and image segmentation, which is the first step in image analysis. Unfortunately, most graph-partitioning problems are NP-hard, which implies that we should not expect efficient algorithms that find optimal solutions. Therefore researchers have resorted to heuristic approaches, which have been implemented in several popular freeware codes and commercial packages.
The goal of this paper is to survey an interesting combination of techniques that have recently led to progress on this problem. The original motivation for this work was theoretical, to design algorithms with the best provable approximation guarantees.* Surprisingly, these ideas have led also to a new framework for designing very fast and
_
figure 1: a graph and a partition into two subsets S, S . in this case, the two subsets have equal number of vertices; such a partition is called a bisection. the number of edges crossing the cut is 7. if the number of vertices on the two sides is within a constant factor of each other (say, factor 2), then we call the partition balanced. Balanced partitions are useful in many applications.
S
S
practically viable algorithms for this problem. On the theoretical end, the ideas have also led to a breakthrough in a long-standing open question on metric space embeddings from the field of function analysis, and new algorithms for semidefinite programming. These are all surveyed in this paper.
We will actually describe two disparate-seeming approaches to graph partitioning which turn out, surprisingly, to be related. The first approach is geometric, and holds the key to actually analyzing the quality of the cut found by the algorithm. The second approach involves routing flows in the graph, which we will illustrate using traffic flows in a road network. This approach holds the key to designing algorithms that run fast. The relationship between these two approaches derives from the fact that they are ( roughly) dual views of each other, thus resulting in algorithms which are both fast and also produce high-quality cuts.
Below, we first sketch the two approaches, and give more details in Sections 2 through 4.
Let us start by describing the geometric approach. We draw the graph in some geometric space (such as the unit disk in two-dimensional Euclidean space ℜ2), such that the average length of an edge is short—i.e., the distance between its endpoints is small—while the points are spread out. More precisely, we require that the distance between the average pair of vertices is a fixed constant, say 1, while the distance between the average adjacent pair of vertices is as small as possible. We will refer to this as an embedding of the graph in the geometric space.
The motivation behind the embedding is that proximity in the geometric space roughly reflects connectivity in the graph, and a good partition of the graph should correspond to separating a large area by cutting along a geometric curve. Indeed, given the properties of the embedding, even a “random partition” of the space by a simple line or curve should work well! The typical edge is unlikely to be cut by a random partition since it is short while a typical pair of vertices is likely to be separated since the distance between them is large. This means that the expected number of vertices on each side of the cut is large while the expected number of edges crossing the cut is small.†
The actual space that our algorithm will “draw” the graph in is not two-dimensional Euclidean space ℜ2, but
* We say that the approximation guarantee of the algorithm is C if given any graph in which the best cut has k edges, the algorithm is guaranteed to find a cut which has no more than C . k edges. Sometimes, we also say the algorithm is a C-approximation.
References:
Archives