Our first concern is to provide images from which the
artists can easily infer shape but that are not so familiar
that they apply domain-specific (“idiomatic”) knowledge
when drawing. This consideration not only rules out overly
abstract or complicated 3D surfaces (i.e., shapes unlike
anything in common experience) but also rules out objects
with strong semantic features (e.g., human faces) and ones
commonly drawn in art classes (e.g., fruit). It also suggests
that multiple views of the shape be provided as prompts so
that ambiguities in one view are resolved by another. Finally,
prompt images should be photorealistic to avoid confusing
artists that are not familiar with classic CG rendering artifacts such as hard shadows and lack of indirect illumination.
For productive analysis, the set of prompts should
include pixels with a wide variety of mathematical properties (e.g., high image gradients, surface critical points, etc.),
and these features should be separated spatially to allow
clear distinctions to be made. This consideration rules out
objects containing only large, planar facets (few interesting surface features), convex objects (no concave surface
features), and other surfaces with few inflections. Rather, it
suggests blobby objects with many curved surfaces.
Finally, the objects must be relatively simple, without
much fine scale detail. Otherwise, the artists may be tempted
to abstract or simply omit important features.
Based on these criteria, we select 12 models of 4 object
types for our study: (a) four bones, (b) two tablecloths, (c) four
mechanical parts, and (d) two synthetic shapes (Figure 3). We
synthesize four prompt images for each model, one for each
combination of two different viewpoints and two lighting
conditions. The two viewpoints are always 30° apart (so that
large parts of each model can be seen from both viewpoints)
and are carefully chosen to distribute surface features across
the image. By providing prompts with different lighting and
different viewpoints for the same model, we can analyze
image-space properties in isolation from object-space ones.
We generate our images using YafRay, 25 a free raytracing
package capable of global illumination using Monte Carlo
pathtracing. The models are rendered using a fully diffuse,
gray material, and thus take on the color of the lighting
environment. For lighting, we use the Eucalyptus Grove and
Grace Cathedral high-dynamic-range environment maps
captured by Debevec. 6
2. 3. Line drawing registration
The final and most difficult part of the study design is to
engineer a system that is able to register line drawings made
by artists to pixels of a prompt image with great accuracy.
Designing such a system is challenging because there is
a trade-off between allowing the artist to draw in a natural
manner (e.g., with pencil on a blank sheet of paper) versus
including constraints that facilitate accurate registration
between prompts and line drawings. On one hand, the drawing process surely must not bias the locations of lines made
by the artist, and thus it is not a good idea to have the artist
compose a drawing directly over the image prompt. On the
other hand, the process must provide enough registration
accuracy to distinguish between important mathematical
properties at nearby pixels in the prompt. This problem is
figure 3. 3D models. the 12 models from our study, each shown in
one of two views and one of two lighting conditions. Groups (a and b)
are scans of real objects; (c and d) are synthetic.
4 Mechanical parts
2 Synthetic shapes
particularly difficult since freehand drawings can be geometrically imprecise, and the intended location of every line
is only known by the artist.
Our design balances these trade-offs with a simple two-step process. The artist is given two sheets of 8. 5″ × 11″ paper
for each line drawing (Figure 4). The prompt page (shown on
the left) contains multiple full color views of the prompt
shape, one of which is large ( 6. 5″ × 4. 75″) and is called the
main view. The drawing page (shown on the right) contains
two boxes, each the same size as the main view. The top box
is initially blank, while the bottom box contains a faint version of the main view.
The artist is asked to complete the drawing page by first
folding the page vertically in half so that only the blank
space at the top is visible (Figure 4, left). Using the viewing
page for reference, the artist draws the prompt shape in the
blank space, just as if they were making a normal sketch.
When finished, the artist unfolds the drawing page and copies their freehand drawing onto the faint image on the bottom of the same page. During the copying step, the artist is
asked to change the shape of their lines to match the target
rendering but not to change the number or relative position
of the lines. In effect, the artist is asked to perform a nonlinear warp of their original drawing onto the target shape.
A typical result is shown on the right side of Figure 4.
We scan the drawing page with a flatbed scanner, locate
fiducials included in the corners of the page, and then use
the fiducials to register the traced lines with the 3D model
rendered from the main viewpoint. An adaptive thresholding method is used to convert the scanned gray-scale image
into a binary image so that all the artist’s lines, regardless
of strength, are included in the binary image. We then
use a thinning operator to narrow the lines in the binary
image down to the width of one pixel. The final result is a