By Immanuel Trummer and Christoph Koch
We propose a generalization of the classical database query
optimization problem: multi-objective parametric query
(MPQ) optimization. MPQ compares alternative processing plans according to multiple execution cost metrics. It
also models missing pieces of information on which plan
costs depend upon as parameters. Both features are crucial to model query processing on modern data processing
MPQ generalizes previously proposed query optimization
variants, such as multi-objective query optimization, parametric query optimization, and traditional query optimization. We show, however, that the MPQ problem has different
properties than prior variants and solving it requires novel
methods. We present an algorithm that solves the MPQ
problem and finds, for a given query, the set of all relevant
query plans. This set contains all plans that realize optimal
execution cost tradeoffs for any combination of parameter
values. Our algorithm is based on dynamic programming
and recursively constructs relevant query plans by combining relevant plans for query parts. We assume that all plan
execution cost functions are piecewise-linear in the parameters. We use linear programming to compare alternative
plans and to identify plans that are not relevant. We present
a complexity analysis of our algorithm and experimentally
evaluate its performance.
1. 1. Context
The goal of the database query optimization is to map a
query (describing the data to generate) to the optimal query
plan (describing how to generate the data). Query optimization is a long standing research area in the database field
dating back to the 1970s.
14 The original query optimization
problem model has been motivated by the capabilities of
data processing systems at that time. However, there have
been fundamental advances in data processing techniques
and systems in the meantime. Hence the original problem
model is not sufficiently expressive to capture all relevant
aspects of modern data processing systems. In this paper,
we propose an extension of the classical query optimization problem model and a corresponding optimization
Alternative query plans are compared according to their
execution cost (e.g., execution time) in query optimization.
Query optimization variants can be classified according to
how they model the execution cost of a single query plan.
Traditional query optimization14 models the cost of a query
plan as scalar cost value c ∈ . This implies that query plans
are compared according to one single cost metric. It also
The original version of this article was published in the
Proceedings of the VLDB Endowment, Volume 8.
implies that all information required to produce cost estimates is available to the query optimizer. The goal in classical query optimization is to find a query plan with minimal
Multi-objective query optimization1, 7, 11, 16, 17 generalizes
the classical model and associates each query plan with
a cost vector c ∈ n instead of a scalar value. This allows to
model scenarios where multiple execution cost metrics are
of interest. If data processing takes place in the cloud then
we are not only interested in execution time but also in monetary execution fees. Different components of the plan cost
vector represent cost according to different cost metrics.
The goal is to find the set of Pareto-optimal query plans for
which no alternative plan offers better cost according to all
Parametric query optimization3, 4, 6, 8, 10, 13 generalizes the
standard model in a different way. It associates each query
plan with a cost function c ∈ m → , mapping from a multidimensional parameter space to a one-dimensional cost
space. Parameters represent pieces of information that are
not yet available at optimization time but required to estimate plan execution cost. For instance, parametric query
optimization allows to optimize query classes that are
defined via query templates with unspecified predicates.
One parameter could then represent the selectivity of one
unspecified predicate. The concrete predicate becomes
known not before run time and so does the concrete parameter value. The goal in parametric query optimization is
typically to find a set of plans containing for each possible
parameter value combination, the plan with minimal execution cost.
1. 2. Problem
We propose multi-objective parametric query (MPQ) optimization, a query optimization variant that generalizes
multi-objective query optimization, parametric query optimization, and classical query optimization at the same time.
MPQ models the cost of a single query plan as a cost function c ∈ m → n that maps a multidimensional parameter
space to a multidimensional cost space. MPQ assumes that
query plans are compared according to multiple cost metrics and that cost estimates depend on parameters whose
values are unknown at optimization time.
The goal in MPQ is to find the set of Pareto-optimal plans
for each possible parameter value combination. This problem model is required wherever the application scenarios of
multi-objective query optimization intersect with the ones