and hence
If we plug
early solutions typically suffered from the curse of dimen-
sionality, but the last decade has witnessed a flurry of new
algorithms that “break the curse” (see Indyk23 for a recent
survey).
into ( 8) and assume that e is smaller than some global e0, we
avoid convergence issues (Lemma 2). By that same lemma,
we now conclude that
A similar technique can be used to bound the left tail esti-
mate. We set k = ce − 2 log n for some large enough c and use a
union bound, possibly rescaling e, to conclude the 2 case of
the first part of the FJLT lemma.
running Time: The vector Dx requires O(d) steps, since D is
diagonal. Computing H(Dx) takes O(d log d) time using the FFT
for Walsh–Hadamard. Finally, computing P(H Dx) requires
O(|P|) time, where |P| is the number of nonzeros in P. This
number is distributed in B(nk, q). It is now immediate to verify
that
A Markov bound establishes the desired complexity of the
FJLT. This concludes our sketch of the proof of the FJLT
lemma. ®
3. APPLicAtions
3. 1. Approximate nearest neighbor searching
Given a metric space (U, dU) and a finite subset (database)
P ⊆ U, the problem of e-approximate nearest neighbor (e-ANN)
searching is to preprocess P so that, given a query x Î U,
a point p Î P satisfying
can be found efficiently. In other words, we are interested in
a point p further from x by a factor at most ( 1 + e) of the distance to its nearest neighbor.
3. 2. fast approximation of large matrices
Large matrices appear in virtually every corner of science.
Exact algorithms for decomposing or solving for large
matrices are often inhibitively expensive to perform. This
may change given improvements in matrix multiplication
technology, but it appears that we will have to rely on matrix
approximation strategies for a while, at least in the general
case. It turns out that FJLT and ideas inspired by it play an
important role in recent developments.
We elaborate on an example from a recent solution of
Sarlós36 to the problem of 2 regression (least square fit of an
overdetermined linear system). Prior to that work (and ours),
Drineas et al. 18 showed that, by downsampling (choosing
only a small subset and discarding the rest) from the set of
equations of the linear regression, an approximate solution
to the problem could be obtained by solving the downsam-pled problem, the size of which depends only on the dimension d of the original solution space. The difficulty with this
method is that the downsampling distribution depends on
norms of rows of the left-singular vector matrix of the original system. Computing this matrix is as hard as the original
regression problem and requires O(m2d) operations, with m
the number of equations. To make this solution more practical, Sarlós observed that multiplying the equation matrix
on the left by the m × m orthogonal matrix HD (as defined
above in the definition of FJLT) implicitly multiplies the left-singular vectors by HD as well. By an analysis similar to the
one above, the resulting left-singular matrix can be shown to
have almost uniform row norm. This allows use of Drineas
et al.’s ideas with uniform sampling of the equations. Put
together, these results imply the first o(m2d) running time
solution for worst-case approximate 2 regression.