Spielman and Srivastava found the probabilities that give
sparsifiers with the fewest number of edges with the help of
some experimentation. Amazingly, the answer turned out to
be related to the effective resistance of the edge, specifically
te = weRe. With hindsight, it is interesting to reflect about the
natural meaning of effective resistance. If there is a wire of
resistance re = 1/we, between i and j, the effective resistance
Re will in general be smaller than re because most probably
there will be other network connections to accommodate
the flow; this is known as Rayleigh’s monotonicity theorem.
The extreme case weRe = 1 occurs only when there is no other
route between i and j except the wire joining them. In this
situation, the edge (i, j) is crucial for the network. On the
other hand if weRe is very small, there must be significant
alternative network connections between (i, j). Therefore,
the product weRe as a measure of the importance of a wire.
Using tools from modern matrix theory,
15 Spielman and
Srivastava proved that this algorithm does return a good
spectral sparsifier with high probability. Combining with
the fact that yields the overall number of
edges: O(n log n).
Despite being a major improvement in the theory of graph
sparsification, the algorithm of Spielman and Srivastava did
not accelerate the SDD solver as current methods for quickly
computing effective resistances require the solution of linear systems. The guarantee of O(n log n) edges is also hard
to connect with the n + C · m/k edges needed by the design
condition. However, it is fair to say that their result cleared
the way to our contribution to the problem.
4. 3. Which off-tree edges?
If we cannot effectively compute effective resistances, can
we at least approximate them quickly, even poorly? A closer
look at the matrix concentration bounds allows us to relax
this goal a bit further: the sampling algorithm described in
Section 4. 2 can be shown to work with any choice of te, as
long as te ≥ weRe. The observant reader may notice that the
expected number of times e is picked is O(te log n), so increasing te only results in more copies of e being picked without
affecting the expectations of all other edges.
The intuition that the low-stretch spanning tree must
be part of the preconditioner leads us to taking tree-based
estimates for the effective resistances Re. In particular,
for an off-tree edge e we let be the sum of the resistances
along the unique path between the endpoints of e in the
tree, as shown in Figure 7. By Rayleigh’s monotonicity
theorem, we know that this estimate will be higher than
the actual Re. This leads to the tree-based sampling probability for an off-tree edge e being proportional to .
Furthermore, if we keep the entire tree in B, we can modify
the sampling algorithm presented in Section 4. 2 to only
sample off tree edges. Then the total number of off-tree
(multi) edges sampled in B is O(t log n) where t is the sum
of all tes, which in turn. This brings us to the question: how
big is t?
This question leads us back to the discussion of
low-stretch spanning tree and the definition of stretch
for the general weighted case: if we view the length of an
edge e as the inverse of its weight, then its stretch equals
Figure 7. the effective resistance R¢e of the blue off-tree edge in the
red tree is 1/4 + 1/5 + 1/2 = 0.95. its stretch weR¢e is (1/4 + 1/5 + 1/2)/
(1/2) = 1. 9.
2
5
4
2
to weR′ e.d Therefore, t is the total stretch of the off-tree
edges with respect to the tree. Then, using the low-stretch
spanning tree of the Theorem in Section 4. 1, we can upper
bound t by O(m log n). Recall that the number of samples
will be t log n and so it appears that we do not gain much
from the sampling process unless the graph A has a very
special tree.
Our key idea is to make a special graph à out of A. We do
so by scaling up, by a factor of k, the weights of edges of a
low-stretch spanning tree in A. For an edge that’s not part
of the tree, its weight does not change, while the tree path
connecting its endpoints is now heavier by a factor of k. So
the stretch decreases by a factor of k and the total stretch of
these edges becomes t = O((m log n)/k). Now, consider what
happens if we sample the off-tree edges in Ã. The output
B will be a 2-approximation of Ã. On the other hand, the
graph à is a k-approximation to A, and by transitivity B is
2k-approximation to A. Also, the number of non-tree edges
sampled will be O(t log n) = O((m log2 n)/k). Adding in the
n − 1 tree edges gives a total of n + O((m log2 n)/k) edges in B.
Recall that the two-level design conditions stated in Section
3. 6 require a k2-approximation with n + C · m/k edges in order
to obtain a running time of O(k · m). So by setting k to O(log
4 n),
we meet the conditions with k = O(log2 n) and arrive at our
first result:
Theorem9. SDD systems can be solved in Õ(m log2 n
log(1/ε) ) time, where ε is a standard measure of the approximation error.
As it turned out, the low-stretch spanning tree is not only a
good base tree, but also tells us which off-tree edges should
go to the preconditioner. Our faster, O(m log n) time algorithm will come via an even better understanding of the
properties of the tree.
4. 4. the final push: Low-stretch spine
Assume that we are given a graph A, found its LSST TA, and
based on it, computed the preconditioner B. Then the O(m
log2 n) time solver algorithm dictates that we recursively do
the same with B. But do we really have to scrap TA and find
another LSST TB After all, it may be the case that TA is a LSST
of B, or close to being one.
d An alternate view is that the stretch of e is the weighted length of the tree
path between e’s end points divided by e’s own length.