do not expect the single nearest exemplar to answer the query, but rather
that the pool of nearby content will give
the user and/or downstream processes
access to relevant candidates.
conclusion
As the world’s store of digital images
continues to grow exponentially, and
as novel data-rich approaches to computer vision begin to emerge, fast techniques capable of accurately searching
very large image collections are critical.
The algorithms we have developed aim
to provide robust but scalable image
search, and results show the practical
impact. While motivated by vision problems, these methods are fairly general,
and may be applicable in other domains
where rich features and massive data
collections abound, such as computational biology or text processing.
Looking forward, an important challenge in this research area is to develop
the representations that will scale in
terms of their distinctiveness; once the
space of images is even more densely
populated, relative differences are subtle. At the same time, flexibility is still a
key to handling intra-category variation.
While our search methods can guarantee query-time performance, it is not yet
possible to guarantee a level of discrimination power for the features chosen. In
addition, a practical issue for evaluating algorithms in this space is the difficulty of quantifying accuracy for truly
massive databases; the data itself is easy
to come by, but without ground truth
annotations, it is unclear how to rigorously evaluate performance.
An interesting aspect of the image
search problem is the subjectivity
related to a real user’s perception of the
quality of a retrieval. We can objectively
quantify accuracy in terms of the cate-
gories contained in a retrieved image,
which is helpful to systematically
validate progress. Moreover, example-
based search often serves as one useful
stage in a larger pipeline with further
processing downstream. Nonetheless,
when end users are in the loop, the per-
ception of quality may vary. On the eval-
uation side, this uncertainty could be
addressed by collecting user apprais-
als of similarity, as is more standard in
natural language processing. In terms
of the algorithms themselves, however,
one can also exploit classic feedback
and query-refinement devices to tailor
retrieval toward the current user. For
example, we could construct learned
image metrics with constraints that
target the preferences of a given user or
group of users.
acknowledgments
I am fortunate to have worked with
a number of terrific collaborators
throughout the various stages of
the projects overviewed in this article—in particular, Trevor Darrell,
Prateek Jain, and Brian Kulis. This
research was supported in part by
NSF CAREER IIS-0747356, Microsoft
Research, and the Henry Luce
Foundation. I would like to thank
Kathryn McKinley and Yong Jae Lee
for feedback on previous drafts, as
well as the anonymous reviewers for
their helpful comments. Thanks to
the following Flickr users for sharing their photos under the Creative
Commons license: belgian-choco-late, c.j.b., edwinn. 11, piston9, sta-
minaplus100, rick-yrhodes, Rick
Smit, Krikit, Vanessa Pike-Russell,
Will Ellis, Yvonne in Willowick Ohio,
robertpaulyoung, lin padgham,
tkcrash123, jennifrog, Zemzina,
Irene2005, and CmdrGravy.
References
1. Agarwal, P., Varadarajan, k.R. A near-linear algorithm
for Euclidean bipartite matching. In Symposium on
Computational Geometry (2004).
2. Avis, D. A survey of heuristics for the weighted
matching problem. networks, 13 (1983), 475–493.
3. bach, F., Lanckriet, g., Jordan, M. Multiple kernel
learning, conic duality, and the SMo algorithm. In
International Conference on Machine Learning (2004).
4. beis, J., Lowe, D. Shape indexing using approximate
nearest-neighbour search in high dimensional spaces.
In Proceedings of the IEEE Conference on Computer
vision and Pattern recognition (1997).
5. Charikar, M. Similarity estimation techniques from
rounding algorithms. In ACM Symposium on Theory of
Computing (2002).
6. Choi, J., Jeon, W., Lee, S.-C. Spatio-temporal pyramid
matching for sports videos. In Proceedings of the
ELACM Conference on Multimedia Information
retrieval (2008).
7. Davis, J., kulis, b., Jain, P., Sra, S., Dhillon, I.
Information-theoretic metric learning. In
International Conference on Machine Learning (2007).
8. Flickner, M. et al. Query by image and video content:
The QbIC system. IEEE Comput. 28, 9 (1995), 23–32.
Kristen Grauman ( grauman@cs.utexas.edu) is an
assistant professor in the Department of Computer
Science at the university of Texas at Austin.