large numbers. Each sample corresponds to a dimension.
The beauty of the scheme is that we can now use it to handle many distances at once.
It is easy to see that randomness is necessary if we hope
to make meaningful use of the reduced data; otherwise
we could be given as input a set of vectors belonging to
the kernel of any fixed matrix, thus losing all information.
The size of the distortion as well as the failure probability are
user-specified parameters that determine the target (low)
dimension. How many dimensions are sufficient? Careful
quantitative calculation reveals that, if all we care about
is distances between pairs of vectors and angles between
them—in other words, the Euclidean geometry of the data—
then a random linear mapping to a space of dimension
logarithmic in the size of the data is sufficient. This statement,
which we formalize in Section 1. 1, follows from Johnson
and Lindenstrauss’s seminal work. 25 The consequence
is quite powerful: If our database contains n vectors in
d dimensions, then we can replace it with one in which data
contains only log n dimensions! Although the original paper
was not stated in a computational language, deriving a naïve
pseudocode for an algorithm implementing the idea in that
paper is almost immediate. This algorithm, which we refer
to as JL for brevity, has been studied in theoretical computer science in many different contexts. The main theme
in this study is improving efficiency of algorithms for high-dimensional geometric problems such as clustering, 37 nearest neighbor searching, 24, 27 and large scale linear algebraic
computation. 18, 28, 35, 36, 38
For many readers it may be obvious that these algorithms
are directly related to widely used technologies such as web
search. For others this may come as a surprise: Where does
a Euclidean space hide in a web full of textual documents?
It turns out that it is very useful to represent text as vectors
in high-dimensional Euclidean space. 34 The dimension in
the latter example can be as high as the number of words
in the text language!
This last example illustrates what a metric embedding is:
a mapping of objects as points in metric spaces. Computer
scientists care about such embeddings because often it is
easier to design algorithms for metric spaces. The simpler the metric space is, the friendlier it is for algorithm
design. Dimensionality is just one out of many measures of
simplicity. We digress from JL by mentioning a few important results in computer science illustrating why embedding input in simple metric spaces is useful. We refer the
reader to Linial et al. 29 for one of the pioneering works in
the field.
• The Traveling Salesman Problem (TSP), in which
one wishes to plan a full tour of a set of cities, with
given costs of traveling between any two cities is
an archetype of a computational hardness which
becomes easier if the cities are embedded in a metric
space, 14 and especially in a low-dimensional Euclidean
one. 7, 30
• Problems such as combinatorial optimization on
graphs become easier if the nodes of the graph can be
embedded in 1 space. (The space d p is defined to be the
98 communicAtions of the Acm | FEbrUary 2010 | VoL. 53 | No. 2
set rd endowed with a norm, where the norm of a vector
x (written as ||x||p) is given by (Sdi= 1|xi|p)1/p. The metric is
given by the distance between pairs of vectors x and y
taken to be ||x − y||p.)
• Embedding into tree metrics (where the distance
between two nodes is defined by the length of the path
joining them) is useful for solving network design optimization problems.
The JL algorithm linearly embeds an input which is
already in a high-dimensional Euclidean space d 2 into a
lower-dimensional k p space for any p ³ 1, and admits a naïve
implementation with O(dk) running time per data vector; in
other words, the complexity is proportional to the number
of random matrix elements.
Our modification of JL is denoted FJLT, for
Fast-Johnson–Lindenstrauss-Transform. Although JL often works
well, it is the computational bottleneck of many applications, such as approximate nearest neighbor searching. 24, 27
In such cases, substituting FJLT yields an immediate
improvement. Another benefit is that implementing FJLT
remains extremely simple. Later in Section 3 we show how
FJLT helps in some of the applications mentioned above.
Until then, we concentrate on the story of FJLT itself, which
is interesting in its own right.
1. 1. A brief history of a quest for a faster JL
Before describing our result, we present the original JL
result in detail, as well as survey results related to its computational aspects. We begin with the central lemma
behind JL. 25 The following are the main variables we will be
manipulating:
X—a set of vectors in Euclidean space (our input data-set). In what follows, we use the term points and vectors
interchangeably.
n—the size of the set X.
d—the dimension of the Euclidean space (typically
very big).
k—the dimension of the space we will reduce the points
in X to (ideally, much smaller than d).
e—a small tolerance parameter, measuring to what is
the maximum allowed distortion rate of the metric space
induced by the set X in Euclidean m-space (the exact definition of distortion will be given below).
In JL, we take k to be ce − 2 log n, for some large enough
figure 1. embedding a spherical metric onto a planar one is no easy
task. the latter is more favorable as input to printers.