tional analysis to facilitate 3D manipulation of objects in a photograph. However, their tool is specifically designed
for cuboids and only works for objects
made of boxes, rectangular plates, and
square pillars. The present technology, on the other hand, can handle a significantly larger variety of objects with
curved surfaces by introducing a sophisticated gestural interaction.
It should be noted, however, that
the present technology currently
supports only one class of primitive:
generalized cylinders. This representation is highly versatile and covers
many human-made objects; nonetheless, it is not sufficient for representing complicated shapes perceived in
nature. Tools are needed to create a
more diverse set of shape primitives to
represent complicated shapes, which
would enable manipulation of arbitrary objects viewed in photographs.
One promising approach is to use a
large collection of known 3D geom-etries.
1 Nevertheless, the core concept
presented in this paper—combining
human perception and computational analysis using clever interactive design—is broad and applicable to the
development of future tools. Rapid developments in 3D image editing tools
inspired by this work can be foreseen,
and it will become increasingly easier
for even casual users to readily edit
photographs. Images will no longer
always depict unaltered reality. Is this
good or bad? The answer may not be
self-evident; however, such a future is
1. Kholgade, N., Simon, T., Efros, A., and Sheikh, Y. 3D
object manipulation in a single photograph using stock
3D models. ACM Trans. Graph. 33, 4 (July 2014),
2. Zheng, Y., Chen, X., Cheng, M. M., Zhou, K., Hu, S. M.,
Mitra, N.J. Interactive images: Cuboid proxies for
smart image manipulation. ACM Trans. Graph. 31, 4
(2012), 99:1–99: 11.
Takeo Igarashi is a professor in the Computer Science
Department at the University of Tokyo, Japan.
Copyright held by author.
NUMEROUS IMAGES APPEAR each day on
smartphone screens, computer displays, and printed materials. News articles and advertisements attract the
attention of viewers by using appealing
images. People often take photographs
with their smartphones and immediately share them with hundreds of
viewers via social media. Moreover,
users include images in documents
and presentations to communicate
messages. These images are typically
edited to appear more aesthetically
pleasing or achieve other objectives.
Thus, there is an increasing need for
advanced image editing tools.
Image editing has significantly
advanced over the past several years
to address consumer demand. Color
adjustment tools now more readily
facilitate simpler image sharpening,
softening, brightening, and darkening. Furthermore, advanced tools can
automatically remove blur and noise.
Users can easily cut an object in an
image and paste it onto another background. Moreover, it is easy to readily
deform shapes as needed and seamlessly blend two images. It is even
possible to erase objects in a scene
by removing the object and filling the
hole by automatically synthesizing
appropriate background images.
Despite these advancements, most
image editing techniques are two-dimensional (2D). Three-dimensional
(3D) image editing has been strongly
desired; however, it is still difficult.
Even the most advanced current image
editing tools lack 3D editing capabilities. This is because it is necessary for
a computer to infer the 3D structure
of the scene for 3D editing, which remains a difficult, ill-posed problem.
Inference of a meaningful 3D structure
requires abundant knowledge about
the physical world, which remains
missing in current systems. Moreover,
3D editing is difficult for users because
the user must provide 3D control infor-
mation to a computer using 2D input
devices, such as a touchpad or mouse.
It is tedious and difficult for inexperi-
enced users to specify the 3D shapes
of an object in a scene and thereby ma-
nipulate the object.
Nonetheless, the importance of 3D
editing is obvious. We exist in a 3D
world, and 3D image editing opens
countless possibilities. With the availability of 3D information, we can easily
view objects from different angles and
compose novel scenes by three dimensionally combining 3D objects. Even
basic cutting and pasting of an object
is not easy with purely 2D editing tools
because the viewing angle changes if
the object is moved. A 3D editing capability would make the cut-and-paste
result much more convincing.
The authors of the following paper present an important step toward
achieving 3D editing. To address this
difficult problem, human perception
and computational analysis are both
required. Therefore, the authors devised an interaction technique, called
3-sweep, which is comprised of three
simple mouse strokes. With this interaction, the user provides guidance
for the computer to segment an object
in the scene and simultaneously infer
the 3D geometry of the object. The system then executes segmentation and
3D reconstruction by inferring details
using image analysis methods. Using
the 3D reconstruction results, the user
can rotate the object to view it from
multiple perspectives. In addition,
the user can cut and paste the object
into different scenes while preserving 3D consistency. Interested readers are strongly encouraged to watch
the authors’ impressive demonstration video ( https://www.youtube.com/
As with most technologies, this 3D
editing technology is not the only one of
its kind; previous efforts exist. A notable
one is the photo editing tool presented
by Zheng et al.
2 They similarly combine
intuitive user interaction and computa-
3D Image Editing
By Takeo Igarashi
To view the accompanying paper,