is difficult for conventional image completion methods to
offer an analogous selection of results.
Some of our image completions are shown in Figure 5.
The bottom result is interesting because the scaffolding
on the cathedral that was masked out has been replaced
with another image patch of the same cathedral. The database happened to contain an image of the same cathedral
from a similar view. It is not our goal to complete scenes and
objects with their true selves in the database, but with an increasingly large database such fortuitous events do occur.
In all of the successful cases, the completion is semantically valid but there might be slight low-level artifacts such
as resolution mismatch between the image and patch, blurring from Poisson’s blending, or fine-scale texture differences between the image and patch. For failure cases these low-level artifacts are often much more pronounced (Figure 6,
top). Another source of failure is a lack of good scene matches which happens more often for atypical scenes (Figure 6,
middle). Semantic violations (e.g., half-objects) account for
another set of failures. The latter is not surprising since the
algorithm has no object recognition capabilities and thus
no notion of object boundaries.
For uniformly textured backgrounds (Figure 7, top), existing image completion algorithms perform well. However,
our algorithm struggles since our scene matching is unlikely to find the exact same texture in another photograph.
Furthermore, image completion algorithms such as Criminisi et al. 4 have explicit structure propagation which helps in
some scenes (Figure 7, bottom).
Our hole filling algorithm requires about 5 min to process
an input image. The scene matching, local context matching, and compositing would take about 50, 20, and 4min
respectively on a single central processing unit (CPU) but
we parallelize all of these across 15 CPUs. Our algorithm is
implemented in MATLAB and all of the timings are for Pen-tium 4 processors.
5. 1. Quantitative evaluation
It is difficult to rigorously define success or failure for an image completion algorithm because so much of it depends
on human perception. While previous approaches demonstrate performance qualitatively by displaying a few results,
we believe that it is very important to also provide a quantitative measure of the algorithm’s success. Therefore, to evaluate our method, we performed a perceptual study to see how
well naive viewers could distinguish our results, as well as
those of a previous approach, 4 from real photographs. The
study was performed on a set of 51 test images that were defined a priori and spanning different types of completions.
We were careful not to include any potentially recognizable
scenes or introduce bias that would favor a particular algorithm. We generated three versions of each image—the real
photograph from which the image completion test cases
were constructed, the result from Criminisi et al., and the
result from our algorithm.
Each of our 20 participants viewed a sequence of images
and classified them as real or manipulated. Of the 51 images
each participant examined, 17 were randomly chosen from
each source, but such that they do not see multiple versions
of the same image. The order of presentation was also randomized. The participants were told that some of the images would be real, but they were not told the ratio of real
versus manipulated images. We also told the participants
that we were timing their responses for each image but that
they should try to be accurate rather than fast. Overall the
participants classified 80% of the images correctly. No effort
was made to normalize for the differences in individual aptitude (which were small).
With unlimited viewing the participants classified our
algorithm’s outputs as real 37% of the time compared with
10% for Criminisi et al. 4 Note that participants identified
real images as such only 87% of the time. Participants scrutinized the images so carefully that they frequently convinced
themselves real images were fake.
It is interesting to examine the responses of participants
over time. In Figure 8 we measure the proportion of images
from each algorithm that have been marked as fake with an
increasing limit on the amount of time allowed. We claim
that if a participant who has been specifically tasked with
finding fake images cannot be sure that an image is fake
within 10s, it is unlikely that an unsuspecting, casual observer would notice anything wrong with the image. After
10s of examination, participants have marked our algorithm’s results as fake only 34% of the time (the other 66%
are either undecided or have marked the image as real already). For Criminisi et al. participants have marked 69%
of the images as fake by 10 s. For real photographs, only 3%
have been marked as fake. All pairwise differences are statistically significant (p < 0.001).
6. Discussion
This paper approaches image completion from an entirely
new direction—orthogonal and complementary to the existing work. While previous algorithms4, 6, 8, 25 suggest clever ways
to reuse visual data within the source image, we demonstrate
the benefits of utilizing semantically valid data from a large
collection of unlabeled images. Our approach successfully
fills in missing regions where prior methods, or even expert
users with the Clone brush, would have no chance of succeeding because there is simply no appropriate image data in the
source image to fill the hole. Likewise, expert users would
have trouble leveraging such a large image collection—it
would take 10 days just to view it with one second spent on
each image. Additionally, this is the first paper in the field of
image completion to undertake a full perceptual user study
and report success rates on a large test set. While the results
suggest substantial improvement over previous work, image
completion is extremely difficult and is far from solved. Given
the complementary strengths of our method and single-image
techniques, a hybrid approach is likely to be rewarding.
It takes a large amount of data for our method to succeed.
We saw dramatic improvement when moving from ten thousand to one million images. But one million images is still
a tiny fraction of the high-quality photographs available on
sites like Picasa or Flickr (which has approximately 2 billion
images). The number of photos on the entire Internet is surely orders of magnitude larger still. Therefore, our approach
would be an attractive Web-based application. A user would