region.
20 Lines are filled first using belief propagation, and
then texture synthesis is applied for the other regions, but
the overall run-time is on the order of minutes for a 0.5
megapixel image. Our system provides similar user annota-
tions, for lines and other region constraints, but treats all
regions in a unified iterative process at interactive rates.
image retargeting. Many methods of image retarget-
ing have applied warping or cropping, using some metric
of saliency to avoid deforming important image regions.
25
Seam carving2 uses a simple greedy approach to prioritize
seams in an image that can safely be removed in retarget-
ing. Although seam carving is fast, it does not preserve
structures well, and offers only limited control over the
results. Simakov et al.
19 proposed framing the problem of
image and video retargeting as a maximization of bidirec-
tional similarity between small patches in the original and
output images, and a similar objective function and opti-
mization algorithm was independently proposed by Wei
et al.
22 as a method to create texture summaries for faster
synthesis. Unfortunately, the approach of Simakov et al. is
extremely slow compared to seam carving. Our constrained
retargeting and image reshuffling applications employ the
same objective function and iterative algorithm as Simakov
et al., using our new nearest-neighbor algorithm to obtain
interactive speeds.
image “reshuffling” is the rearrangement of content in
an image, according to user input, without precise mattes.
Reshuffling was demonstrated simultaneously by Simakov
et al.
19 and Cho et al.,
8 who used larger image patches and
belief propagation in an MRF formulation. Reshuffling
requires the minimization of a global error function, as
objects may move significant distances, and greedy algo-
rithms will introduce large artifacts. In contrast to all pre-
vious work, our reshuffling method is fully interactive. As
this task might be particularly hard and badly constrained,
these algorithms do not always produce the expected result.
Therefore interactivity is essential, as it allows the user to
preserve some semantically important structures from
being reshuffled, and to quickly choose the best result
among alternatives.
object detection, digital forgeries, and collages. In addition to image editing, our algorithm can be applied to other
problems in image analysis and video. For object detection, we take an approach similar to deformable template
models.
11 Unlike previous approaches, we do not need to
restrict our matching to sparse interest points or detect
principle scales or orientations. We can instead use our
matching algorithm to match all patches across all scales
and orientations. For digital forgery detection, Popescu
and Farid17 previously demonstrated a method that can
find regions of an image duplicated by a clone tool, by
sorting image blocks after discarding JPEG compression
artifacts. Our method works similarly. Our method is not
robust to JPEG artifacts, but uses our more general matching algorithm, so it could potentially be generalized to
find different types of forgeries such as those produced by
our automatic hole filling. Finally, images can be stitched
into collages8, 19 using patch-based methods. We use this
approach for our video summarization application.
3. matchiNG aLGoRithm
In this section we describe our core matching algorithm,
which accelerates the problem of finding nearest neighbor
patches by 20–100 x over previous work. Our algorithm is a
randomized approximation algorithm: it does not always
return the exact nearest neighbor, but returns a good
approximate nearest neighbor quickly, and improves the
estimate with each iteration.
For brevity’s sake, in this paper we present a simplified
version of the algorithm that searches for only one nearest neighbor per-patch, across only two translation dimensions. There will be more discussion of extensions later.
3. 1. high level motivation
The high level intuition behind our algorithm is shown
in Figure 4. We have two images A and B with patches
visualized as colored rectangles. We wish to find for each
patch in A the most similar patch in B. We do this by taking advantage of spatial locality properties: when we have
a good match, we can propagate it to adjacent points on
the image, and if we have a reasonable match, we can try
to improve it by randomly searching for better matches
around the target position.
We define a NNF as a function f: A R2 of nearest neighbors, defined over all possible patch coordinates (locations
of patch centers) in image A, for some distance function of
two patches D. Given patch coordinate a in image A and its
corresponding nearest neighbor b in image B, f(a) is simply b.a We refer to the values of f as nearest neighbors, and they
are stored in an array whose dimensions are those of A.
As a reminder, our key insights are to search in the space
of possible coordinate offsets, to search over adjacent
patches cooperatively, and that even random coordinate
assignments are likely to be a good guess for many patches
over a large image.
The algorithm has three main components, illustrated
in Figure 5. Initially, the NNF is filled with either uniform
random assignments or some prior information. Next, an
iterative update process is applied to the NNF, in which
good patch nearest neighbors are propagated to adjacent
figure 4. illustration of the matching problem. for each patch in
image A, we find the most similar patch in image B. sometimes
patches that overlap have identical or coherent matches (shown
in the yellow and green matches), and sometimes the matching
coordinates are nearby (the red match).
a Our notation is in absolute coordinates, versus relative coordinates in
Barnes et al.
6