here, two 1-d feature sets are used to
form two histogram pyramids. each row
corresponds to a pyramid level. in (a), set Y
is on the left, and set Z is on the right; points
are distributed along the vertical axis.
light lines are bin boundaries, bold dashed
lines indicate a new pair matched at this
level, and bold solid lines indicate a match
already formed at a finer resolution level.
in (b) multiresolution histograms are
shown; (c) shows their intersections.
the pyramid match function P uses
these intersection counts to measure how
many new matches occurred at each level.
here, Ii = I(Hi(Y), Hi(Z) ) = 2, 4, 5 across
levels, so the number of new matches
counted are 2, 2, 1. the weighted sum
over these counts gives the pyramid
match similarity.
y z
H0(y)
H0(z)
min(H0(y),H0(z))
I0= 2
y z
H1(y)
H1(z)
min(H1(y),H1(z))
I1= 4
y z
H2(y)
H2(z)
I2= 5
the figure is reprinted from Grauman and
darrell12 with permission, ©2005 ieee.
(a) Point sets
(b) Histogram pyramids
(c) Intersections
severely limits the practicality of
large input sizes. In contrast, the pyramid match approximation requires
only O(mL) time, where L = log D, and
L << m. In practice, this translates to
speedups of several orders of magnitude relative to the optimal match
for sets with thousands of features.
We use a multidimensional, multiresolution histogram pyramid to partition the feature space into increasingly
larger regions. At the finest resolution
level in the pyramid, the partitions
(bins) are very small; at successive levels
they continue to grow in size until a single bin encompasses the entire feature
space. At some level along this gradation
in bin sizes, any two particular points
from two given point sets will begin to
share a bin in the pyramid, and when
they do, they are considered matched.
The key is that the pyramid allows us
to extract a matching score without
computing distances between any of
the points in the input sets—the size of
the bin that two points share indicates
the farthest distance they could be from
one another. We show that a weighted
intersection of two pyramids defines an
implicit partial correspondence based
on the smallest histogram cell where a
matched pair of points first appears.
Let a histogram pyramid for input
example X Î S be defined as: Ψ(X) =
[H0(X), …, HL− 1(X)], where L specifies the
number of pyramid levels, and Hi(X) is a
histogram vector over points in X. The
bins continually increase in size from
the finest-level histogram H0 until the
coarsest-level histogram HL− 1. For low-
dimensional feature spaces, the bound-
aries of the bins are computed simply
with a uniform partitioning along all
feature dimensions, with the length of
each bin side doubling at each level. For
high-dimensional feature spaces (for
example, d > 15), we use hierarchical
clustering to concentrate the bin par-
titions where feature points tend to
cluster for typical point sets. 13 In either
case, we maintain a sparse representa-
tion per point set that maps each point
to its bin indices. Even though there is
an exponential growth in the number
of possible histogram bins relative to
the feature dimension (for uniform
bins) or histogram levels (for nonuni-
form bins), any given set of features can
occupy only a small number of them.
An image with m features results in a
pyramid description with no more than
mL nonzero entries to store.
Thus, for each pyramid level, we
want to count the number of “new”
matches—the number of feature pairs
that were not in correspondence at
any finer resolution level. For example, in Figure 3, there are two points
matched at the finest scale, two new
matches at the medium scale, and
one at the coarsest scale.
To calculate the match count, we use
histogram intersection, which measures the “overlap” between the mass in
two histograms: I(P, Q) = Srj=1min(Pj, Qj),
where P and Q are histograms with r
bins, and Pj denotes the count of the
j-th bin. The intersection value effectively counts the number of points in
two sets that match at a given quantization level. To calculate the number
of newly matched pairs induced at
level i, we only need to compute the
difference between successive levels’
intersections. By using the change in
intersection values at each level, we
count matches without ever explicitly
searching for similar points or computing inter-feature distances.
The pyramid match similarity score
P between two input sets y and Z is
then defined as the weighted sum of
the number of new matches per level: