frustration on the face of any musician who has to wrestle
with the delay from a digital synthesizer can be attributed to
the Uncertainty Principle.
Before we show how to use this principle, we must stop
and ask: what are the tools we have at our disposal? We may
write the matrix F as a product of matrices, or, algorithmi-cally, apply a chain of linear mappings on an input vector.
With that in mind, an interesting family of matrices we can
apply to an input vector is the orthogonal family of d-by-d
matrices. Such matrices are isometries: The Euclidean geometry suffers no distortion from their application.
With this in mind, we precondition the random k-by-d mapping with a Fourier transform (via an efficient FFT algorithm)
in order to isometrically densify any sparse vector. To prevent
the inverse effect, i.e., the sparsification of dense vectors,
we add a little randomization to the Fourier transform (see
Section 2 for details). The reason this works is because sparse
vectors are rare within the space of all vectors. Think of them
as forming a tiny ball within a huge one: if you are inside the
tiny ball, a random transformation is likely to take you outside;
on the other hand, if you are outside to begin with, the transformation is highly unlikely to take you inside the tiny ball.
The resulting FJLT shares the low-distortion characteristics of JL but with a lower running time complexity.
2. the DetAiLs of fJLt
In this section we show how to construct a matrix F drawn
from FJLT and then prove that it works, namely:
1. It provides a low distortion guarantee. (In addition to
showing that it embeds vectors in low-dimensional k 2,
we will show it also embeds in k1.)
2. Applying it to a vector is efficiently computable.
The first property is shared by the standard JL and its variants, while the second one is the main novelty of this work.
2. 1. constructing F
We first make some simplifying assumptions. We may
assume with no loss of generality that d is a power of two,
d = 2h > k, and that n W d = W(e −1/2); otherwise the dimension
of the reduced space is linear in the original dimension. Our
random embedding F ∼ FJLT (n, d, e, p) is a product of three
real-valued matrices (Figure 3):
F;=;PHD.
figure 3. fJLt.
± 1
Sparse
JL
k×d
Walsh–
Hadamard
d×d
± 1
...
± 1
d×d
100 communicAtions of the Acm | FEbrUary 2010 | VoL. 53 | No. 2
The matrices P and D are chosen randomly whereas H is
deterministic:
• P is a k-by-d matrix. Each element is an independent
mixture of 0 with an unbiased normal distribution of
variance 1/q, where
In other words, Pij ∼ N(0, 1/q) with probability q, and
Pij = 0 with probability 1 − q.
Hij = d–1/2 (– 1)〈 i – 1, j – 1〉,
where 〈i, j〉 is the dot-product (modulo 2) of the m-bit
vectors i, j expressed in binary.
The Walsh–Hadamard matrix corresponds to the discrete
Fourier transform over the additive group GF ( 2)d: its FFT is
very simple to compute and requires only O(d log d) steps.
It follows that the mapping Fx of any point x ∈ Rd can be
computed in time O(d log d + |P|), where |P| is the number
of nonzero entries in P. The latter is O(e − 2 log n) not only on
average but also with high probability. Thus we can assume
that the running time of O(d log d + qde − 2 log n) is worst-case,
and not just expected.
The FJl T lemma. Given a fixed set X of n points in rd, e < 1, and
p Î { 1, 2}, draw a matrix F from FJLT. With probability at least
2/3, the following two events occur:
1. For any x Î X,
where and a2 = k.
2. The mapping F: rd ® rk requires
operations.
Remark: By repeating the construction O(log (1/d )) times we
can ensure that the probability of failure drops to d for any
desired d > 0. By failure we mean that either the first or the
second part of the lemma does not hold.
2. 2. showing that F works
We sketch a proof of the FJLT Lemma. Without loss of generality, we can assume that e < e0 for some suitably small e0.
Fix x Î X. The inequalities of the lemma are invariant under
scaling, so we can assume that ||x|| 2 = 1. Consider the random
variable u = HDx, denoted by (u1, …, ud)T. The first coordinate
u1 is of the form Sd i= 1 ai xi , where each ai = ± d−1/2 is chosen independently and uniformly. We can use a standard tail estimate technique to prove that, with probability at least, say,
0.95,
( 3)