Image Search at
the Speed of Thought
Santosh Mathan
Honeywell | Santosh.Mathan@honeywell.com
Editors’ Note: The ability to sense
cognitive activity—ranging from
information-processing load,
attention levels, and perceptual
judgments—using EEG sensors
provides a basis for developing systems that adapt to users and exploit
latent human capabilities. Santosh
Mathan describes a neural interface
being developed by Honeywell for
searching through large image sets
efficiently.
The problem of finding information in large volumes of
imagery is a challenging one,
with few good solutions. While
most search engines allow users
to find information in collections of text quite efficiently,
there is a lack of similar solutions when it comes to searching
for imagery. The problem is computers aren’t able to interpret
imagery very well. They can’t
deal with novelty, variability,
or exploit contextual information and prior knowledge to the
extent that humans can.
Unfortunately, most manual
image analysis tools currently
in use are inefficient—tapping
into slow and deliberate cognitive processes. Most image
search and analysis tools do
not exploit the reliable split-second perceptual judgments
that people make all the time—
think of returning a tennis serve
or reacting to an obstacle on
the highway while driving. The
question we have been asking is
whether we can tap into these
fleeting perceptual judgments,
in order to find visual information within large image sets
efficiently.
Split-Second
Perceptual Judgments
Our efforts have relied on a
combination of the rapid serial
visual presentation (RSVP) presentation technique and the
event-related potential (ERP)
signal detected using electroencephalograph (EEG) sensors.
We have largely focused on
broad area image analysis, a
domain where users have to
extract critical information
from large collections of high-resolution satellite imagery. In
our approach, broad area
images, spanning tens of thousands of pixels in width and
height, are decomposed into a
grid of image chips a few hundred pixels wide and tall. These
chips are presented to users in
high-speed bursts, anywhere
from 10 to 15 chips per second.
A set of head-worn EEG sensors
record neural responses to each
chip presented to the user.
Images that elicit an ERP signal
are classified as targets.
The ERP signal is thought