plans p1 and p2 and we find that plan p1 has better cost than
or equivalent cost to p2 according to all cost metrics for the
parameter space region X then we reduce the Pareto region
of p2 by subtracting X. Pareto regions can only shrink during
a pruning operation. Once the region of one plan becomes
empty, it is irrelevant and can be safely discarded. We discard plans as soon as possible in order to avoid unnecessary
More precisely, the pruning function iterates over all
plan pairs and executes for each pair the following steps.
First, it identifies the region in which one plan dominates
the other plan. Second, it updates the Pareto region of the
dominated plan by subtracting the region in which it is
dominated. Third, it checks whether the Pareto region of
the dominated plan becomes empty after the update. In
that case, the plan is discarded and does not participate in
further comparisons. Figure 2 illustrates how the Pareto
region of a plan is reduced after comparing it to another
plan. The example refers to a scenario where two parameters and two cost metrics are considered (execution time
Note that two plans can mutually dominate each other in
different parameter space regions. Having determined that
a first plan dominates a second plan for some parameter
space region, we must therefore still verify if the second plan
dominates the first plan as well.
3. 3. Data structures
We describe the data structures by which we represent plan
cost functions and Pareto regions. Our plan cost model is
based on piecewise-linear functions. A piecewise-linear
function is linear in parameter space regions that form
convex polytopes. A linear function can be represented by
a constant and by weights capturing the slope of the function for each parameter. Hence a piecewise-linear function
can be represented by a set of convex polytopes where each
convex polytope is associated with a constant and weights.
We consider multiple plan cost metrics. Each query plan is
therefore associated with one piecewise-linear cost function
per plan cost metric.
We consider the class of piecewise-linear cost functions
to represent plan cost. We decided to use that class of func-
tions since it allows to approximate arbitrary functions
the desired algorithm output. In summary, our algorithm
can be written as follows:
• Iterate over all subqueries s of the input query in ascend-
ing order of query size:
– If subquery s is an atomic subquery then consider all
possible plans for s.
– Otherwise, if s is not an atomic subquery, then iterate
over all possibilities to decompose s into two subqueries s1 and s2:
For each split into two subqueries s1 and s2, consider all plans that are combinations of a relevant
plan for s1 and a relevant plan for s2.
– Prune all considered plans to obtain the set of relevant
plans for s.
As many query optimization algorithms,
8, 14, 16 our algorithm is based on dynamic programming. We can use
dynamic programming since the principle of optimality holds
for query optimization.
14 Formulated in general terms, the
principle of optimality designates the problem property that
optimal solutions can be obtained by combining optimal
solutions to subproblems. In the context of query optimization, the principle of optimality means more specifically that
optimal query plans can be obtained by combining optimal
plans for subqueries. The principle of optimality has been
shown to hold for all common execution cost metrics in multi-objective query optimization.
16 This means that a Pareto-optimal query plan can be combined from Pareto-optimal
plans for subqueries. A relevant plan is Pareto-optimal for
some points in the parameter space. It is therefore intuitive
that a relevant query plan can be combined from relevant
plans for the subqueries (we omit the formal proof). In other
words, the principle of optimality holds for MPQ as well. It is
the fundament of our MPQ algorithm.
3. 2. Pruning
Many query optimization algorithms for classical query
14 multi-objective query optimization,
parametric query optimization8 are based on dynamic programming. The primary difference between all those algorithms is the realization of the pruning function. As we treat
a novel problem variant, we must design a novel pruning
function. In the following, we describe how our algorithm
prunes query plans, that is, how it compares plans for the
same query and identifies irrelevant plans.
Our pruning function is based on the key concept of the
Pareto region. Each query plan is associated with a Pareto
region. This is a parameter space region in which it realizes Pareto-optimal cost tradeoffs. A plan is irrelevant if its
Pareto region is empty. The goal of the pruning function is to
compare a set of plans answering the same query in order to
calculate their Pareto regions. The pruning function works
as follows. At pruning start, we assume by default that each
plan is Pareto-optimal in the entire parameter space. This
means that we assign the entire parameter space as Pareto
region to each query plan. During pruning, we compare all
query plans answering the same query pair-wise in order
to calculate their true Pareto regions. If we compare two
Figure 2. We subtract the area in which a plan is dominated from its