What the O(m log2 n) algorithm9 missed is the observation
that we can keep sampling based on the same tree, gradually
generating all levels of the multilevel hierarchy, until what is
left is the tree itself. This justifies thinking of a low-stretch
spanning tree as a graph spine, and is depicted in Figure 8.
When the sparsifier B is viewed as a graph, it is possible
for some of its edges to have high stretch. However, a more
careful reexamination of the sampling algorithm shows that
these edges are the result of an edge being sampled many
times. From this perspective, these heavy edges are in fact
many multi-edges, each with low stretch. Therefore, if we
process these multi-edges separately, the tree TA will be a
low-stretch spanning tree in B, and the higher edge count
is still bounded by the number of rounds made by the sampling algorithm. This allows us to use TA as a low-stretch
spanning tree and sample the off-tree edges in B according to it. Note that with this modification, it’s possible for
us to observe a temporary “slow down” in the reduction of
the overall edge count; the preconditioner of B may have the
same number of off-tree edges as B itself. However the total
number of multi-edges will decrease at a rate that meets the
design conditions. This reuse of the tree for generating sparsifiers is a crucial deviation from prior works.
But this doesn’t fully explain the faster solver algorithm.
To achieve it we need an extra trick. Assume for a moment
that our graph A is what we call spine-heavy; that is, it has
a tree of total stretch equal to O(m/log n). Then by an argument analogous to the one using a standard low stretch
spanning tree, we can show that B actually satisfies the two-level preconditioning requirement for an even lower value of k,
namely a fixed constant. This, in combination with spine-based sampling allows us to solve spine-heavy graphs in
linear time. A more global view of this algorithm, as shown in
Figure 8. Low-stretch spanning tree as a spine. the “cloud” of off-tree
edges becomes progressively sparser.
106 CommuniCAtionS oF thE ACm | oCTobeR2012 | VoL. 55 | no. 10
Figure 8 is that it progressively makes the tree heavier, while
removing off-tree ones.
Since our initial graph A cannot be expected to be spine-heavy, we make a spine-heavy graph à out of A, by scaling-up
its LSST by an O(log2 n) factor. Now à is an O(log2 n
)-approxi-mation to A and we can solve it in O(m) time. Using it as preconditioner for A completes the O(m log n) time solver. So,
we have arrived at our destination.
Theorem10. SDD systems can be solved in õ(m log n log(1/ε) )
time, where ε is a standard measure of the approximation
error.
5. EXtEnSionS
5. 1. Parallelization
Several algorithms in numerical linear algebra have parallel versions that are work-efficient. A parallel algorithm is
called work-efficient if it performs roughly the same work
as its best sequential algorithm for the same problem, while
allowing the use of parallel processing.
The first steps towards studying the parallelism potential
of SDD solvers were taken in Blelloch et al.,
3 which presented
a nearly (up to log factors) work-efficient algorithm, running
in O(m1/3) parallel time. Informally, this means that up to m2/3
parallel processes can be used to accelerate the algorithm,
a non-trivial potential for parallelism.
5. 2. implementation
The most complicated component of our solver is the algorithm
for computing a LSST. It is however expected that a conceptually simpler algorithm for this problem is to be discovered,
leading to a fast and “clean” implementation, and quite likely
the removal of the log log n factors from the running time.
In a practical implementation, it would be a good idea to
substitute the recursive preconditioned Chebyshev iteration
by a recursive preconditioned Conjugate Gradient (PCG) iteration. It is known that, in two-level methods, PCG is essentially able to automatically optimize the performance of the
preconditioner. It is expected that the same should be true
for some multilevel variant of PCG, but this is yet to be proven.
We expect that, eventually, the best implementations of
SDD solvers will combine ideas from this work and other
existing graph-based techniques,
8 or entirely new ideas.
Such ideas will certainly be needed to achieve—if possible—
a “fully parallel”, O(log n) time, work-efficient SDD solver.
6. thE LAPLACiAn PARADiGm
Solvers for SDD systems are increasingly viewed as an algorithmic primitive; a fundamental subroutine that can be
used to design many other efficient algorithms. Indeed,
since the Spielman–Teng breakthrough, the availability
of fast SDD solvers has sparked what has been dubbed the
laplacian paradigm: an entire class of new algorithms spanning various areas. Because it is impossible to do justice
to each one of these topics, we will present some unifying
themes and only point to some representative examples of
applications.
Perhaps the most direct example of using the solver as a
primitive is the computation of eigenvectors. It was shown