Caltech-101: classification error with hashing
ML+PMK linear scan
PMK linear scan
Photo Tourism: recall vs NN
ML linear scan
L2 linear scan
h(xi) = h(xj)
h(xi) ≠ h(xj)
k−nearest neighbors error rate
0.2 0.4 0.6
(a) semi-supervised hash functions: intuition
0.8 1 1. 2 1. 4 1. 6
100 200 300 400 500 600 700 800 900 1000
Number of nearest neighbors (k)
(b) Results with semi-supervised hashing
(a) A generic hash function would choose the orientation of the hyperplane r uniformly at random, causing collisions only for examples that have small angles
between their features (xi and xj). in constrast, the distribution of our randomized semi-supervised hash functions is such that examples like those constrained to
be similar are more likely to collide (left), while pairs like those constrained to be dissimilar are less likely to collide (right). here the hourglass shapes denote the
regions from which our randomized hash functions will most likely be drawn. (b) semi-supervised hash functions encode the learned metric, and allow guaranteed sublinear time queries that are similar in accuracy to a naive linear scan.
Plots are reprinted from Jain et al. 19 with permission, ©2008 IEEE.
Given the matrix A for a metric
learned as above, such that A = GTG, we
generate the following randomized
hash functions hr,A:
where the vector r is chosen at random from a d-dimensional Gaussian
distribution with zero mean and unit
By parameterizing the hash functions by both r and G, we enforce the
following probability of collision:
which sustains the LSH requirement
for a learned Mahalanobis metric.
Essentially we have shifted the random hyperplane r according to A, and
by factoring it by G we allow the random hash function itself to “carry” the
information about the learned metric.
The denominator in the cosine term
normalizes the kernel values.