figure 5: Procedure to produce a cut from the embedding of the
graph on the sphere. the points in the thin slice in the middle are
thrown out.
S
Unit sphere in �d
for finding a balanced partitioning of the vertices with few
crossing edges.† This is done as follows: simply pick a ran-
dom distance 0 ≤ r ≤ ∆, and select every vertex i such that u
i
is within r of some point in S, i.e. for some x ∈ S, |u − x| 2 ≤
i
r. To see that this works, note that the probability that any
edge e with length l(e) is in the cut is at most l(e)/∆. Thus,
the typical number of edges crossing such a cut is at most
1/∆ times the total length of the edges (which, recall, is at
most the size of the optimal cut). This yields an O ( log n )
-approximation algorithm since ∆ = Ω (1/ log n ).
We now turn our attention to actually finding the almost
antipodal sets S and T. Before we do so, it is instructive to
understand an important geometric property of a high-dimensional Euclidean space ℜd. Consider projecting the
surface of the unit sphere on to a line u through its center.
Clearly, the points on the sphere will project to the interval
[− 1, 1]. If we start from a uniformly random point u on the
sphere, what is the distribution of the projected point (u, u)
in [− 1, 1]? It turns out that the projected point has a Gaussian distribution, with expectation 0 and standard deviation
1/ d . This means that the expected squared distance of a
projected point from the center is 1/d, and a constant fraction of the surface of the sphere is projected at least 1/ d
away from the origin. Another way to say all this is that if
we consider slices of the sphere by parallel hyperplanes the
surface area of the slices vary like a Gaussian according to
distance of the slice from the center of the sphere.
Now returning to our procedure for identifying sets S and
T from the embedding of the graph, we start by projecting
the n points u , . . . , u on a randomly chosen line u → through
1n
the center. We discard all points whose projection has abso-
lute value less than 1/ d . The remaining points form two
sets P and N, according to whether their projection is posi-
tive or negative. By the Gaussian property above, the sets P
and N have expected size Ω(n). Ideally, we would like to stop
here and assert that every point in x∈ P is far from every
point in y ∈ N.
Indeed, it is true (and easily checked) that with high
probability over the choice of the line on which we are
† Note that we started off trying to find a partition into sets of equal size, but
this method will only yield a partition in which the two sets have sizes within
a factor 2 of each other.
doing the projection, |x − y| 2 ≥ 1/log n. However, we actually want a greater separation of ∆ = 1/ log n , so we enforce
this condition by sequentially deleting pairs {x, y} that violate it. The remaining points define two sets S ⊆ P and T ⊆
N which are ∆-separated. The difficult part is to show that
S and T have cardinality Ω(n). For the interested reader, we
have sketched the main ingredients of the proof of this fact
in Section 6. Figure 6 below summarizes the resulting algorithm for finding a good cut.
figure 6: finding a good cut.
Given the embedding u , . . . , u .
1n
→
• Pick a random line u through the origin.
• let P = {ui : (ui , u) ≥ 1/ d }
and N ={u i : (u i, u) ≤ 1/ d}
• discard pairs of points from x ∈ P and y ∈ N such that
2
x − y ≤ ∆ = 1 / logn.
call the resulting sets S and T.
• choose random 0 ≤ r ≤ ∆, and let X = {i : |u − x| 2 ≤ r} for some x ∈ S.
i
• output partition (X, V – X).
3. exPanDeR fLows
In Section 1. 2, we described a dual flow-based approach to
partitioning a graph. In this section, we describe how to extend that to the expander flow framework, which leads to
an efficient implementation of the O ( log n ) approximation algorithm. Before we can describe this new framework,
we must introduce a number of new concepts. Let us start
by recalling the San Francisco Bay Area traffic nightmare
scenario of Section 1. 2, where we chose to route a traffic
pattern described by a complete graph. We described an efficient algorithm for computing good routes for the traffic
to minimize the worst traffic jams. We also showed how to
use the resulting information to actually find a good partition for the county’s road network.
In this section, we will focus on the choice of traffic
pattern. We will see that choosing the traffic pattern in a
clever way will reveal a much better partition of the road
network. To begin with, notice that if the traffic pattern is
very local—for example, if each person visits a friend who
lives a few blocks away—then the precise location of the
congested streets has little information about the actual
bottlenecks or sparse cuts in the underlying road network.
So we start by formally understanding what kinds of traffic
graphs will reveal the kind of information we seek. To be
more concrete, by the traffic graph, we mean the weighted
graph in which nodes i and j have an edge of weight c if
ij
they have a flow rate c between them.
ij
We require the traffic graph to be “expanding,” that is,
the total traffic flow out of every subset of nodes is proportional to the number of nodes in the subset. In more