Extracting 3D Objects from
Photographs Using 3-Sweep
By Tao Chen, Zhe Zhu, Shi-Min Hu, Daniel Cohen-Or, and Ariel Shamir
DOI: 10.1145/3007175
Abstract
We introduce an interactive technique to extract and manipulate simple 3D shapes in a single photograph. Such extraction requires an understanding of the shape’s components,
their projections, and their relationships. These cognitive
tasks are simple for humans, but particularly difficult for
automatic algorithms. Thus, our approach combines the
cognitive abilities of humans with the computational accuracy of the machine to create a simple modeling tool. In
our interface, the human draws three strokes over the photograph to generate a 3D component that snaps to the outline of the shape. Each stroke defines one dimension of the
component. Such human assistance implicitly segments a
complex object into its components, and positions them
in space. The computer reshapes the component to fit the
image of the object in the photograph as well as to satisfy
various inferred geometric constraints between components imposed by a global 3D structure. We show that this
intelligent interactive modeling tool provides the means
to create editable 3D parts quickly. Once the 3D object has
been extracted, it can be quickly edited and placed back into
photos or 3D scenes, permitting object-driven photo editing
tasks which are impossible to perform in image-space.
1. INTRODUCTION
Extracting three dimensional objects from a single photo
is still a long way from reality given the current state of
technology, since it involves numerous complex tasks: the
target object must be separated from its background, and
its 3D pose, shape, and structure should be recognized from
its projection. These tasks are difficult, even ill-posed,
since they require some degree of semantic understanding of the object. To alleviate this difficulty, complex 3D
models can be partitioned into simpler parts that can be
extracted from the photo. However, assembling parts into
an object also requires further semantic understanding
and is difficult to perform automatically. Moreover, having decomposed a 3D shape into parts, the relationships
between these parts should also be understood and maintained in the final composition.
In this paper, we present an interactive technique to
extract 3D man-made objects from a single photograph,
leveraging the strengths of both humans and computers.
Human perceptual abilities are used to partition, recognize,
and position shape parts, using a very simple interface based
on triplets of strokes, while the computer performs tasks
which are computationally intensive or require accuracy.
The final object model produced by our method includes
its geometry and structure, as well as some of its semantics.
This allows the extracted model to be readily available for
The original version of this paper is entitled “3-Sweep:
Extracting Editable Objects from a Single Photo” and
was published in ACM Transactions on Graphics, Volume
32, Issue 6—Proceedings of ACM SIGGRAPH Asia 2013, No-
vember 2013 Article No. 195.
intelligent editing, which maintains the shape’s semantics
(see Figure 1).
Our approach is based on the observation that many
man-made objects can be decomposed into simpler parts
that can be represented by generalized cylinders, cuboids,
or similar primitives. A generalized cylinder is a cylindrical
primitive shape where the central axis is a curve instead of
a line, the shape’s profile can be any 2D closed curve and
not just a circle, and this shape can also change along the
curve. In this work, we use just circular and cuboid profiles. The key contribution of our method is an interactive tool that guides and assists the user in the creation
of a 3D editable object by defining its primitive parts. The
tool is based on a rather simple modeling gesture we call
3-Sweep. This gesture allows the user to explicitly define
the three dimensions of a geometric primitive using three
sweeps. The first two sweeps define the first and second
dimension of a 2D profile and the third, usually longer,
sweep is used to define the main curved axis of the primitive (see Figure 2).
Figure 1. 3-Sweep Object Extraction. (a) Input image. (b) Extracted
edges. (c) 3-Sweep modeling of one component of the object. (d) The
full extracted 3D model. (e) Changing the object viewpoint. (f) Editing
the model by rotating each arm in a different direction, and pasting
onto a new background. The base of the object is transferred by
alpha matting and compositing.
(a) (b) (c)
(d) (e) (f)