an interesting question is whether combinations of those
properties can be used to predict where artists will draw
more accurately than any of them alone. To investigate
this question, we have experimented with several regression models, including linear regression, radial basis functions, regression trees, and several others. As an example,
Figure 11a shows a regression tree built with the M5P package in Weka24 to predict the set of line drawings for one view
of the twoboxcloth model shown in Figure 3. Figure 11a
shows the prediction of p resulting from this simple tree,
while Figure 11b provides a visualization of which pixels
sort into which leaves of the tree (pixels in the image are
colored to match the text of the leaf).
In this example, several properties are combined by the
decision tree to predict p, starting with ImgGradMag at the
root. The set of properties chosen is instructive, as it suggests that they provide the highest incremental value in
predicting p (at the start of tree building). Of course, many
properties are correlated, and the decision tree may be non-optimal, so an alternative tree may have produced similar or
better predictions. Nevertheless, it is interesting to see how
non-trivial combinations of local properties can be used
to make predictions—even though the tree was purposely
kept small in this example, it still is able to provide a plausible (albeit coarse) prediction for where artists draw lines
(Figure 11a). If we consider deeper trees or other regression
models, we are able to predict p from x more accurately.
figure 10. Example local surface features. top: the frequencies
of pixels near artists’ lines (blue), and away from artists’ lines
(green, dashed), as functions of local surface properties. Bottom:
pixels near artists’ lines as a fraction of the total. Pixels where
N · V ª 0 or where the sobel response is high are very likely near
figure 11. Decision tree for predicting where artists will draw.
(Left) decision tree learned from prompts of bones, (a) predicted
probabilities of where artists will draw for this view (black is high
probability), (b) a visualization of which pixels fall into which leaves
of the tree. note that this tree was purposely kept small for didactic
purposes, yielding coarse prediction.
3. 4. which local properties are most important?
In our data mining framework, it is not only possible to
predict where artists will draw but also to examine which
local features are most important when building such a
regression model. For example, Random Forests1
estimate the importance of every feature to its model by building a large number of decision trees trained on different
subsets of the data. 1 For each feature m of each built tree,
the error observed in predictions for the “out of bag” data
(the part held out of training) is computed and compared
to the error that is observed when values of feature m are
permuted. The difference between these errors, averaged
and normalized, is reported as the “importance” of feature m. For this analysis, we make the assumption that
almost all occluding contours (N · V = 0) are drawn by artists (Figure 10), and so exclude any pixel within 1 mm of a
contour from the training set.
Table 1 shows the relative feature importance as computed with the Random Forest implementation of Breiman
and Cutler in R for the remaining pixels of all drawings in our
study. 20 The first four columns report the importance of features (rows) estimated when training on models from each of
one type (bones, cloth, mechanical, and synthetic), while the
rightmost column reports the average over the whole data set.
The results indicate that image-space intensity gradient magnitude is the feature among the tested set that is
most useful in predicting the probability that an artist will
draw at a particular location in our study (e.g., the average prediction error is largest if values of the image-space
gradient magnitude are randomized). While image-space
discontinuities often appear at the same place as boundary
contours and occluding contours (N · V = 0), the locations
where those contours appear have been excluded from this
study. So, this result suggests that image-space intensity
gradients away from the contours are also highly correlated
with artist line locations. Of course, this is not surprising,
as ridges, valleys, and shadow boundaries are commonly
drawn by artists. However, it is a bit surprising how all the
simple image-space features (which do not require a 3D
model to compute) are so important relative to the other
more complex properties that have been the focus of recent
research in CG.
3. 5. which CG lines are most important?
We use Random Forests to compute importance of the CG
line definitions studied in Section 3. 2 for predicting where
artists draw lines. For this analysis, we compute a new feature vector for every pixel storing the strength for every CG
line definition. Note that strength is only defined at pixels
where the algorithm would draw a line (e.g., zeros of maximum curvature derivative for ridges). At all other pixels,
strength is always zero. We then recompute the Random
Forests with the new feature vectors.