not feasible, even if the metric itself
is efficient. Unfortunately, traditional
methods for fast search cannot guarantee low query-time performance for
arbitrary specialized metrics.a This section overviews our work designing hash
functions that enable approximate similarity search for both types of metrics
introduced above: a matching between
sets, and learned Mahalanobis kernels.
The main idea of our approach is to
construct a new family of hash functions that will satisfy the locality sensitivity requirement that is central to
existing randomized algorithms5, 16 for
approximate nearest neighbor search.
Locality sensitive hashing (LSH) has
been formulated in two related contexts—one in which the likelihood of
collision is guaranteed relative to a
threshold on the radius surrounding a
query point, 16 and another where collision probabilities are equated with a
similarity function score. 5 We use the
latter definition here.
A family of LSH functions F is a dis-
tribution of functions where for any
two objects xi and xj,
( 2)
where sim(xi, xj) Î [0, 1] is some simi-
larity function, and h(x) is a hash
function drawn from F that returns
a single bit. 5 Concatenating a series
of b hash functions drawn from F
yields b-dimensional hash keys. When
h(xi) = h(xj), xi and xj collide in the hash
table. Because the probability that two
inputs collide is equal to the similarity
between them, highly similar objects
are indexed together in the hash table
with high probability. On the other
hand, if two objects are very dissimi-
lar, they are unlikely to share a hash
key (see Figure 6). Given valid LSH
functions, the query time for retrieving
( 1 + )-near neighbors is bounded by
O(N1/( 1+)) for the Hamming distance
and a database of size N. 10 One can
therefore trade off the accuracy of the
search with the query time required.
To embed the pyramid match as an
inner product, we exploit the relationship between a dot product and the
min operator used in the PMK’s intersections. Taking the minimum of two
values is equivalent to computing the
dot product of a unary-style encoding
in which a value u is written as a list
of u ones, followed by a zero padding
large enough to allot space for the maximal value that will occur. So, since a
weighted intersection value is equal to
the intersection of weighted values, we
can compute the embedding by stacking up the histograms from a single
pyramid, and weighting the entries
associated with each pyramid level
appropriately. Our embedding enables
LSH for normalized partial match
similarity with local features, and we
have shown that it can achieve results
very close to a naive linear scan when
searching only a small fraction of an
image database (1%–2%) (see Grauman
and Darrell14 for more details).
semi-supervised hashing. To provide
suitable hash functions for learned
Mahalanobis metrics, we propose
altering the distribution from which
the randomized hyperplanes are
drawn. Rather than drawing the vector r uniformly at random, we want
to bias the selection so that similarity
constraints provided for the metric
learning process are also respected
by the hash functions. In other words,
we still want similar examples to collide, but now that similarity cannot be
purely based on the image measurements xi and xj; it must also reflect the
constraints that yield the improved
(learned) metric (see Figure 7a). We
refer to this as “semi-supervised”
hashing, since the hash functions will
be influenced by any available partial
annotations, much as the learned metrics were in the previous section.
In Jain et al., 19 we present a straightforward solution for the case of relatively low-dimensional input vector
spaces, and further derive a solution to
accommodate very high-dimensional
data for which explicit input space
computations are infeasible. The former contribution makes fast indexing
accessible for numerous existing metric
learning methods, while the latter is of
particular interest for commonly used
image representations, such as bags-of-words or multiresolution histograms.
( 3)
An LSH function that exploits this
relationship is given by Charikar. 5 The
hash function hr accepts a vector x Î
d, and outputs a bit depending on
the sign of its product with r:
Since