niques they discuss directly into the
platform itself. But these same ideas
(for example, those around caching
or lazy evaluation) could be applied
on a per-application basis on top of a
more general-purpose serving system
as well. The paper describes ideas for
improving training speed, serving performance, and usability.
LASER uses a variety of techniques
for intelligent caching and materialization in order to provide real-time inference (these are similar to the view-maintenance strategies discussed in
§ 3. 3. 2 of the MauveDB paper). The
models described in LASER predict a
score for displaying a particular ad to
a particular user. As such, the model
includes linear terms that depend
only on the ad or user, as well as a quadratic term that depends on both the
user and the ad. LASER exploits this
model structure to partially prematerialize and cache results in ways that
maximize cache reuse and minimize
wasted computation and storage. The
quadratic term is expensive to compute in real time but precomputing
the full cross product matching users
to ads (a technique described in the
literature as full prematerialization)
would be wasteful and expensive, especially in a setting such as online advertising when user preferences can
change quickly, and ad campaigns frequently start and stop.
Instead, the paper describes how
LASER leverages the specific structure
of its generalized linear models to prematerialize part of the cross product
to accelerate inference without incurring the waste of precomputing the
entire product. LASER also maintains
a partial results cache for each user
and ad campaign. This factorized
cache design is particularly well suited for advertising settings in which
many ad campaigns are run on each
user. Caching the user-specific terms
amortizes the computation cost
across the many ad predictions, resulting in an overall speedup for inference with minimal storage overhead.
The partial prematerialization and
caching strategies deployed in LASER
could be applied to a much broader
class of models (for example, neural
features or word embeddings).
LASER also uses two techniques
that trade off short-term prediction
accuracy for long-term benefits. First,
LASER does online exploration using
Thompson sampling to explore ads
with high variance in their expected
values because of small sample sizes.
Thompson sampling is one of a fam-
ily of exploration techniques that
systematically trade off exploiting
current knowledge (for example, serv-
ing a known good ad) and exploring
unknown parts of the decision space
(serving a high-variance ad) to maxi-
mize long-term utility.
Second, the LASER team adopted a
philosophy it calls “Better wrong than
late.” If a term in the model takes too
long to be computed (for example, be-
cause it is fetching data from a remote
data store), the model will simply fill
in the unbiased estimate for the value
and return a prediction with degraded
accuracy rather than blocking until
the term can be computed. In the case
of a user-facing application, any reve-
nue gained by a slightly more accurate
prediction is likely to be outweighed
by the loss in engagement caused by a
Web page taking too long to load.
There are two key takeaways from
the LASER paper: First, trained models often perform computation whose
structure can be analyzed and exploited to improve inference performance
or reduce cost; second, it is critical
to evaluate deployment decisions
for machine-learning models in the
context of how the predictions will
be used rather than blindly trying to
maximize performance on a validation dataset.
Applying Cost-Based Query
Optimization to Deep Learning
D. Kang, J. Emmons, F. Abuzaid, P. Bailis,
and M. Zaharia
NoScope: Optimizing neural network queries
over video at scale. In Proceedings of the
VLDB Endowment 10, 11 (2017); https://dl.
acm.org/citation.cfm?id=3137664.
This paper from Kang et al. at Stanford presents a set of techniques for
significantly reducing the cost of prediction serving for object detection in
video streams. The work is motivated
by current hardware trends—in particular, that the cost of video data acquisition is dropping as cameras get
cheaper, while state-of-the-art computer vision models require expensive
hardware accelerators such as GPUs
The LASER team
deliberately
restricted the
scope of models
that it supports—
generalized
linear models
with logistic
regression—but
took an end-to-end
approach to
building a system
to support
these models
throughout
the entire
machine-learning
life cycle.