of parametric query optimization. The following example
describes a scenario in which MPQ is necessary.
Example 1. Assume that we need to process the same query
in regular time intervals. Query processing takes place in the
cloud and we would like to use Amazon EC2 Spot Instances.
Here, we care about two execution cost metrics which is
the execution time and the monetary execution fees. We
can trade between them by adapting the type and number
of the computational resources that we rent from Amazon.
The query processing cost also depends on parameters: the
pricing of Amazon Spot Instances. Due to unpredictable
fluctuations, we do not know their values in advance. As we
process the same query repeatedly, we can determine the set
of all potentially relevant query plans in a preprocessing
step. At run time, given concrete Spot prices and execution cost
bounds, we can efficiently select the best query plan out of
the precomputed set. This avoids expensive optimization at
run time. The preprocessing step requires MPQ since multiple
plan cost metrics and parameters need to be considered.
There are many other scenarios in which multiple processing cost metrics are of interest. Techniques for approximate query processing allow us to trade between execution
time and result precision.
1 Different query plans can realize different tradeoffs between energy consumption and
execution time for the same query.
18 If data is processed
by crowd workers then latency, execution fees, and result
precision are all relevant cost metrics.
12 If the queries we
want to process at run time correspond to query templates
that are known before run time then we can make query
optimization a preprocessing step. At preprocessing time,
plan cost estimates depend on parameters with unknown
values. Those parameters can represent query properties
which are not fully specified in the template or properties
of the query execution platform (e.g., the Spot Instance
prices) that will become known only at run time. MPQ is
applicable in such scenarios and avoids query optimization at run time.
The result of MPQ is the set of all potentially relevant
query plans for a given query or query template. It contains
all Pareto-optimal plans for each possible parameter value
combination. At run time, we can select the best query plan
out of that set based on the concrete parameter values and
based on user preferences. Users can specify their preferences in advance (e.g., by specifying cost bounds and priorities between different cost metrics1, 16) such that the optimal
plan according to those preferences can be selected automatically. As an alternative, we can use the precomputed
plan set to visualize all Pareto-optimal cost tradeoffs for
given parameter values. This allows users to select the preferred cost tradeoff directly.
17 Figure 1 illustrates the context
So far we have introduced a very generic problem model
for MPQ. In order to make the problem tractable, we restrict
ourselves to a specific class of cost functions in this paper:
we consider piecewise-linear plan cost functions. Many
approaches for parametric query optimization6, 8 consider
only piecewise-linear plan cost functions as well since such
functions can approximate arbitrary functions.
8 The dif-
ference between our model and the one used in paramet-
ric query optimization is that we associate each plan with
multiple piecewise-linear cost functions representing cost
according to different metrics.
1. 3. Algorithm
We present an algorithm that solves MPQ for piecewise-linear
plan cost functions. Our algorithm is based on dynamic programming. It recursively decomposes the input query for
which we need to determine the set of relevant query plans
into subqueries. In a bottom-up approach, it recursively calculates sets of relevant plans for a query out of optimal plan
sets for its subqueries: it combines plans that are relevant
for the subqueries to form new plans that are potentially
relevant for the decomposed query. Dynamic programming
is a classical approach for query optimization. The crucial
difference between our algorithm and prior algorithms is
the implementation of the pruning function, that is, in how
we compare alternative query plans and prune out suboptimal plans.
Conceptually, we associate each plan for a query or subquery with a region in the parameter space for which the
plan is Pareto-optimal. We call this region the Pareto region.
The goal during pruning is to compare alternative plans
generating the same result in order to discard suboptimal
plans. We compare plans pair-wise and determine for each
plan the parameter space region in which it is dominated
by another plan, that is, in which the other plan has comparable or better cost according to each plan cost metric. Then
we reduce the Pareto region of the first plan by the region in
which it is dominated. If the Pareto region of a plan becomes
empty then it is not Pareto-optimal for any parameter value
combination and can be discarded.
All Pareto regions that could ever occur during the execution of that algorithm can be represented using the following formalism. We represent Pareto regions as a union of
convex polytopes in the parameter space from which other
convex polytopes have been subtracted. We prove that this
representation is closed under all operations that the algorithm needs to perform on Pareto regions. Note that this
Figure 1. Multi-objective parametric query optimization precomputes
a set of relevant query plans. The optimal plan is selected from that
set according to parameter values and user preferences.