( 2)
n−c', for constant c' growing with c, which allows us to apply a
union bound over all pairs of distances in X.
Following Johnson and Lindenstrauss, 25 various researchers suggested simplifications of the original JL design and of
their proofs (Frankl and Maehara, 20 DasGupta and Gupta, 17
Indyk and Motwany24). These simplifications slightly change
the distribution from which F is drawn and result in a better constant c and simpler proofs. These results, however, do
not depart from the original JL from a computational point
of view, because the necessary time to apply F to a vector is
still order of nk.
A bold and ingenious attempt to reduce this cost was
taken by Achlioptas. 1 He noticed that the only property of
F needed for the transformation to work is that (Fi · x) 2 be
tightly concentrated around the mean 1/d for all unit vectors
x Î rd, where Fi is the ith row of F. The distribution he pro-
posed is very simple: Choose each element of F uniformly
from the following distribution:
absolute constant c. We then choose a random subspace of
dimension k in rd (we omit the mathematical details of what
a random subspace is), and define F to be the operation of
projecting a point in rd onto the subspace. We remind the
reader that such an operation is linear, and is hence equiva-
lently representable by a matrix. In other words, we’ve just
defined a random matrix. Denote it by F.
A
figure 2. A triangle cannot be embedded onto a line while
simultaneously preserving distances between all pairs of vertices.
The nice property of this distribution is that it is relatively
sparse: on average, a fraction 2/3 of the entries of F are 0.
Assuming we want to apply F on many points in rd in a real-time setting, we can keep a linked list of all the nonzeros of
F during preprocessing and reap the rewards in the form of
a threefold speedup in running time.
Is Achlioptas’s result optimal, or is it possible to get a
super constant speedup? This question is the point of departure for this work. One idea to obtain a speedup, aside from
sparsifying F, would be to reduce the target dimension k, and
multiply by a smaller matrix F. Does this have a chance of
working? A lower bound of Alon5 provides a negative answer
to this question, and dashes any hope of reducing the number of rows of F by more than a factor of O(log(1/e)). The
remaining question is hence whether the matrix can be
made sparser than Achlioptas’s construction. This idea has
been explored by Bingham and Mannila. 11 They considered
sparse projection heuristics, namely, fixing most of the
entries of F as zeroes. They noticed that in practice such
matrices F seem to give a considerable speedup with little
compromise in distortion for data found in certain applications. Unfortunately, it can be shown that sparsifying F
by more than a constant factor (as implicitly suggested in
Bingham and Mannila’s work) will not work for all inputs.
Indeed, a sparse matrix will typically distort a sparse vector.
The intuition for this is given by an extreme case: If both F
and the vector x are very sparse, the product Fx may be null,
not necessarily because of cancellations, but more simply
because each multiplication Fijxj is itself zero.
B
A
C
B
C
1. 2. the random densification technique
In order to prevent the problem of simultaneous sparsity
of F and x, we use a central concept from harmonic analysis known as the Heisenberg principle—so named because
it is the key idea behind the Uncertainty Principle: a signal
and its spectrum cannot be both concentrated. The look of