figure 7: a comparison of results for reduction and expansion of the car image (leftmost) using least cost seam of Equation 5 (left in each
pair) and least inserted cost seams of Equation 9 (right in each pair). We discuss image enlarging in section 4. 3.
Monotonicity: the seam must include one, and only one
pixel, in each row (or column for horizontal seams).
connectivity: the pixels of the seams must be connected.
The monotonicity constraint ensures it is a function,
while the connectivity constraint enforces continuity. Hence,
the challenge is to construct a graph that guarantees the
resulting cut will be a continuous function over the relevant
domain. However, standard graph-cut-based constructions
do not satisfy these constraints.
In our construction every node represents a pixel, and
connects to its neighboring pixels in a grid-like structure.
Virtual terminal nodes, S (source) and T (sink) are created
and connected with infinite weight arcs only to the pixels of the leftmost and rightmost columns of the image,
respectively (and only to the sides of the cube for video).
Moreover, the key difference is that we use backward
infinity arcs between the nodes. These arcs constrain the
resulting cut. To define a seam from a cut, we consistently
choose the pixels to the left of the cut arcs. The optimal
seam is defined by the minimum cut which is the cut that
has the minimum cost among all valid cuts. Figure 8 illustrates the construction of such graphs on images for the
two approaches of Equations 5 and 9. These graphs constructions guarantee that the seam defined by the graph-cut algorithm would be the optimal, monotonic, and
In the case of video we repeat the grid-like construction of the graph both in space (within a frame) and in
time (between frames in one direction). In addition, we
figure 8: two graph constructions that are used by the graph-cut
algorithm illustrated by four neighboring nodes. the actual image
graph is created by tiling these subgraphs across the image. the
left graph is equivalent to the dynamic-programming seam carving
approach of Equation 5 and right graph represents the new forward
energy of Equation 9.
+LU −LU +LU −LU
connect the source and sink nodes to the leftmost and
rightmost pixels in all the frames of the video (for the case
of vertical seams). The resulting cut of this 3D graph is a
2D manifold whose intersection with every video frame
produces a valid seam (Figure 2, right). Hence, a vertical
seam can be thought of as a discretely continuous function
S: Y × T → X from (row, time) to column. Note that similar
constructions can be used to change the height as well as
the length of the video in time. More details can be found
3. 5. multisize media
So far we have assumed that the user knows the target
size ahead of time, but this might not be possible in all
cases. Consider, for example, an image embedded in
a web page. The web designer does not know, ahead of
time, at what resolution the page will be displayed and
therefore cannot generate a single target image. In a different scenario, the user might want to try different target sizes and choose the one most suitable for his or her
needs. Seam carving is linear in the number of pixels
and resizing is therefore linear in the number of seams
to be removed, or inserted. Nevertheless, when video is
concerned carving one seam surface could take up to a
few seconds. Therefore, computing seams in real time is
a challenging task.
To address these issues we define a representation of
multisize media that encodes, for an image or video of size
(m × n), an entire range of retargeting sizes from 1 × 1 to
m × n and even further for expansion to N′ × M′, when
N′ > n, M′ > m (see Section 4. 3). Using a preprocessing stage
to compute and encode the seams, multisize images and
video allow real time retargeting to any target size even on
very low end processing units.
Looking at it from a different perspective, multisize
media can be seen as storing an explicit representation of
the time-evolution implicit process of seam removals and
insertions. Consider, for example, the case of changing the
width of the media. We define an index map v of size n × m
(×t if it is a video), that encodes, for each pixel, the index of
the seam that removed it, i.e., v (i, j) = t means that pixel (i, j)
was removed by the tth seam removal. To get an image (or
frame in the video) of width m′, we only need to gather, in
each row, all pixels with seam index greater than or equal to
m − m′. A similar map h can be defined to change the height
of the media (Figure 9).