DoI: 10.1145/1941487.1941512
technical Perspective
Images everywhere
Looking for models
By Guillermo Sapiro
aBoUt 5,000 IMaGes per minute are uploaded to the photo-sharing site http://
www.flickr.com/; over 7,000,000 a day.
Similar numbers are uploaded to other social sites. Often these images are
acquired by amateur photographers
under non-ideal conditions and with
low-end digital cameras such as those
available in mobile phones. Such images often look noisy, blurry, and with
the wrong colors or contrast. Even images acquired by high-end devices,
such as MRI or microscopy, suffer
from these effects due to the intrinsic
physics of the device and the structure
of the material being photographed. A
key challenge in image science then is
how to go from the “low-” quality image to a high-quality one that is sharp,
has good contrast, and is clean of artifacts. This is an intrinsically ill-posed
inverse problem, according to Hadamard’s definition. So, what do we do?
We have to include additional assumptions, a process often called
regularization. These assumptions come
with different names depending on
one’s particular area of research or
interest, and are often called priors or
models. Deriving appropriate regularization terms, priors or models, has
occupied the research community
since the early days of digital image
processing, and we have witnessed
fantastic and very inspiring models
such as linear and nonlinear diffusion,
wavelets, and total variation. Different
image models can be appropriate for
different types of images; for example,
MRI and natural images should have
different models. Indeed, some models might be useful for some inverse
problems and not for others.
In their landmark paper, Buades,
Coll, and Morel discuss a number of
image models under a unified frame-
work. Let us concentrate on the self-
similarity model, which leads to the
important non-local means algorithm
proposed by the authors for image
denoising and its extensions to other
image inverse problems. The basic
underlying concept is that local image
information repeats itself across the
non-local image. Noise, on the other
hand, is expected in numerous scenar-
ios to be random. Therefore, collecting
those similar local regions all across
the image, the noise can be eliminated
by simple estimators based on having
multiple observations of the same un-
derlying signal under different noise
conditions. This simple and powerful
idea of self-similarity, which brings a
unique perspective of simultaneous
local and non-local processing, dates
at least to Shannon’s model for Eng-
lish writings in 1950 (“Prediction and
Entropy of Printed English,” Bell Sys.
Tech. J., 50–64), and was used in image
processing for synthesis tasks. But it
was not until the 2005 elegant paper
by Buades et al. that the community
had its Eureka moment and clearly re-
alized it could be exploited for recon-
structions challenges as well.
In their landmark
paper, Buades, Coll,
and morel discuss
a number of image
models under
a unified framework.
its optimality are naturally raised. The
image processing community is busy
addressing these questions.
There is another critical aspect
clearly illustrated by the following
seminal work, this being the idea of
addressing image inverse problems
with overlapping local image regions,
or overlapping image patches. In many
scenarios this became the working
unit, replacing the standard single
point or pixel (sometimes these are
now called super-pixels). While some
researchers have adopted models that
are different than the self-similarity
one, it is safe to say that today, six years
after their original paper was published, the state-of-the-art techniques
for image reconstruction, as well as
for image classification, are all based
on working with these super-pixels
or patches. This has become a fundamental building block of virtually all
image models.
The authors’ work also starts hinting at the idea that we can learn the
model from the data itself, or at least
adapt it to the image, instead of relying on predefined mathematical
structures. This relates to dictionary
learning, where the image is modeled
as being represented via a learned
dictionary. The self-similarity model
assumes the dictionary is the image
itself, or actually its local patches. All
these models indicate that images,
and in particular image patches, do
not actually live in the ambient high-dimensional space, but in some much
lower dimensional stratification embedded on it.
For over 40 years, the image processing community has been on the
lookout for image models. The most
fundamental of them have left important footprints in the community.
Many of the questions are still open
today, from the eternal battle between generative and discriminative
models to the need of deriving computationally feasible and fundamentally useful models. All this work goes
to the root of our desire to know “What
is an image?”
Guillermo Sapiro ( guille@umn.edu) is a professor in the
Department of electrical and Computer engineering at
the university of Minnesota.