Figure 6. two possible spanning trees of the unweighted square grid,
shown with red edges.
trees for the grid: a quantitative example
The two spanning trees of the 8 × 8 grid shown in
In the tree on the left, for each vertical edge beyond
column , at least horizontal edges are needed
to travel between its endpoints; that means that its
stretch is at least . So the n/2 edges in the right half
of the square grid contribute a total stretch of n1.5.
In the tree on the right, all edges along the middle
row and column still have stretch . However,
the middle row and column only have edges
and so they contribute only O(n) to the total stretch.
Recall that all we need is a low total stretch, so a small
number of high-stretch edges is permitted. Having
accounted for the edges in the middle row and col-
umn, the argument can then be repeated on the four
smaller subgraphs of size n/4 formed by removing the
middle row and column. These pieces have trees that
are constructed similarly, leading to the recurrence
TotalStretch(n) = 4 × TotalStretch(n/4) + O(n).
Its solution is TotalStretch(n) = O(n log n).
A generalization of this type of “accounting”, that
keeps the number of high stretch edges small, forms
the basis of the current state-of-the-art algorithms.
1
Algorithms for the computation of LSSTs were first studied in an unrelated context,
2 where it was shown that any
graph contains a spanning tree with total stretch O(m1+ε); the
tree can be found in O(m log n) time. The total stretch was
lowered to O(m log2 n) in Elkin et al.,
6 and further to Õ(m log
n) in Abraham and Neiman,
1 giving the following theorem.
Theorem. Every graph has a spanning tree of total stretch
Õ(m log n). The tree can be found in Õ(m log n) time.
Boman and Hendrickson first introduced LSSTs as
stand alone preconditioners in 2001. This was a catalyst to
subsequent progress, which used LSSTs as a base for building
even more intricate preconditioners. In fact, LSSTs are indispensable components of all nearly-linear time SDD system
solvers. It is worth pointing out that while LSSTs were originally conceived as potentially good two-level preconditioners,
their full power in the context of multilevel solvers was not
realized until our work, which we describe in Section 4. 3.
4. 2. Sparsification
Spielman and Teng’s19 main contribution was a “tour de
force” algorithm for finding a preconditioner that’s the
LSST plus a small number of edges. It took many by surprise
as it yielded the first nearly-linear time SDD solver.
Describing the algorithm is out of the scope of this article, but it is worth noting its two enhancements over previous approaches. First, instead of just adding off-tree edges
from A back onto the tree, the algorithm re-weights them.
The tree edges may themselves be re-weighted in the final
preconditioner B. Second, the procedure for adding edges in
B is not deterministic but randomized, as it contains a process for sampling edges from A.
However the major conceptual and technical contribution
of Spielman and Teng that formed the cornerstone of their
solver was a sparsification algorithm. They showed that every
graph A has a 2-approximation B which has O(n logc n) edges for
some large constant c. The graph B is called the sparsifier and,
of course, it can be used as a preconditioner when A is dense.
After the first sparsification result, progress towards faster
SDD solvers took a detour through the study of spectral sparsification as a stand-alone problem. Works by Batson, Kolla,
Makarychev, Saberi, Spielman, Srivastava, and Teng led to
nearly-optimal spectral sparsifiers, albeit at the cost of much
higher running time. These results were motivated by the
work of Spielman and Srivastava,
18 who gave an extremely
simple algorithm for finding spectral sparsifiers with only
O(n log n) edges. Their algorithm, as well as the Spielman–
Teng spectral sparsification algorithm builds upon a framework established by Benczur and Karger for sampling and
re-weighting a graph.
The framework requires positive numbers te assigned
to each edge, corresponding to the relative probabilities
of sampling them. It calculates the sum of these numbers,
and proceeds for O(t log n) rounds. In each round one
new edge is added to the sparsifier B. The edge is picked randomly with replacement among the m edges of A, but not in
a “fair” way. An edge e is picked with relative probability te,
which equates to a probability of pe = te/t. Once an edge is
picked, it is added to B with weight scaled down by a factor
of O(te log n). Furthermore, if an edge is picked twice or more
during this process, each new copy is added as a parallel
edge, making B potentially a multi-graph.
Understanding re-weighting
While it may appear complicated, the re-weighting
choice is quite natural. The reasoning is that the
“expected value” of B should be A itself on an edge-to-edge basis. In other words, the average of many B’s
output by the algorithm should be A itself.