in camera coordinates. The x and y values of the end points
are determined by the projection and their depth values can
be found as a function of zi, the z value of Ri, by using three
orthogonality constraint equations.
Next, the positions of the anchor points Ci, j in world coordinates are defined using the local orthogonal axes, giving
the structure of part i. Since the local axes depend only on
the depth value zi of the point Ri, we can parameterize the
positions of Ci, j as a function of zi: Ci, j = Fi, j(zi): the position
and orientation of the whole part become a function of a
single unknown zi. Fi, j has the form Fi, j(zi) = b/(a(zi + v)) for
each coordinate component, where a depends only on the x
and y coordinates of the endpoints of the local axes, and b,
v are decided by perspective parameters. They are different
for each axis endpoint and for each coordinate component
(see Ref.
6 for the full definition).
Defining geo-semantic constraints. We use the anchor
points to define the geo-semantic relationships between
parts. Specifically, we support six types of constraints: parallelism, orthogonality, and collinearity of primitive axes,
overlapping endpoints of axes, coplanar axis endpoints,
and coplanar axes. During modeling, for each type, we test
whether a pair of components is close to satisfying one of
the above geo-semantic constraints. If so, we add this constraint to the definition of the object (see Figure 4). For
example, for two cylinders with indices m and n, if the angle
between vector (Cm, 1 − Cm, 2) and (Cn, 1 − Cn, 2) is less than 15°,
we add a parallelism constraint (Cm, 1 − Cm, 2) × (Cn, 1 − Cn, 2) = 0
to the system of constraints. Similarly if any three of the four
anchors of two cylinders form a triangle containing an angle
larger than 170°, we add a collinear axes constraint: (C1 − C2)
× (C1 − C3) = 0. Internal constraints such as orthogonality and
concentricity of a cuboid’s axes are also added to the definition, and some editing operations such as copying and pasting parts (see Section 6) can impose same-size constraints.
Finally, we also allow the user to manually enforce or revoke
any constraint for selected primitive parts.
Establishing an objective function. Suppose we have
found p geo-semantic constraints Gk for a set of n components. Together with the objective function for fitting the
image outline, we define the following optimization system:
( 1)
subject to Gk (C1, 1, . . ., Cn,mn ), k =
1, . . ., p , ( 2)
where mi is the number of axes of the ith primitive part. We
add weights wi proportional to the radius of the base profile
of each part and the length of its axis. Larger parts have more
impact on the solution since typically larger parts are modeled more accurately. Intuitively, the first equation tries to fit
projection of the part’s geometry (Ci, j) to the image outline
and the user’s gestures, while the second set of equations
imposes the geo-semantic constraints.
Two-step solution. Solving for Ci, j and zi together is a non-
linear non-convex optimization problem with nonlinear con-
straints. Directly solving such a system without becoming
trapped in a local minimum is very difficult. Hence, we decom-
pose the solution into a two step procedure. The first step tries
to find a good initial position for all parts at once, by chang-
ing only their depths (governed by zi) to meet the geo-semantic
constraints. In the second step, the full system is solved, allow-
ing the shapes of the parts (Ci, j) to change as well.
In the first step, we modify the soft constraint in Equation
( 1) to a hard one, and replace Ci, j by Fi, j(zi) in all equations. This
means that Equation ( 1) is trivially true and we are left with just
the constraints in Equation ( 2). In effect, this means we fix the
projection and find the optimal zi meeting the geo-semantic
constraints. This reduces the number of variables to n (zi, 1 ≤
i ≤ n) and changes Equation ( 2) into a potentially over-deter-
mined system, where each equation only contains two differ-
ent variables. We find the least squares solution by using the
conjugate gradient method, with all zi values initialized to 0.
This first step provides a good initialization to find the
optimal solution for Ci, j, which should be close to Fi, j( ),
requiring only small inconsistencies to be fixed with the geo-semantic constraints. Hence, in the second step, we carry out
full optimization of Equation ( 1) with the set of constraints
in Equation ( 2) by an augmented Lagrangian method. Both
steps are fast, and we are able to avoid local minima by the
initialization of the first step. This also permits optimization to be carried out at interactive speed (see examples at
https://vimeo.com/148236679). Note that the nonlinearity of
Fi, j( ) arises due to the assumption of a perspective projection. However, we can linearly approximate this projection since we assume the change in zi is small. This further
increases the speed and stability of our solution.
Curved shapes. To handle parts with a non-straight axis,
we first simplify the problem by assuming that the general
axis lies in a plane. We define a non-straight part as a blend
of two straight-axis sub-parts, placed at the two ends of the
part. The position of each of these sub-parts is determined
by a single depth value in the optimization above. The blend
between these parts for the whole part is defined by connecting the two subparts with a piecewise linear curve where the
axis points are determined when constraining the profile
while snapping it during the user’s sweep. More details can
be found in Ref.
6
Texturing. Once the object has been modeled, we can map
the texture from the image onto the object, as shown in Figure 6.
By projecting a vertex of the mesh to the image plane, we can
get the 2D coordinates of the vertex in the image, which are
then used as texture coordinates to map the corresponding
part of the image onto the model. Since there is no information regarding the back of the object, we simply use a symmetry assumption and mirror the front texture to the back.
In each half of the model (front and back), we assign the same
texture coordinate for the two vertices that are mirrored
symmetrically about the central plane. On the two sides of the
object (left and right), there are vertices whose normal is
perpendicular, or almost perpendicular, to the image plane.
To deal with such vertices, we treat the texture associated
with them as holes, and use the image completion technique4
to fill them.
6. EXPERIMENTAL RESULTS
We implemented the 3-Sweep interactive technique in C++
inside a simple modeling system. The system provides an