constraints, they are specified over the package result P, for
example, COUNT(P.*) = 3, which limits the query results to
packages of exactly 3 tuples.
The global predicates in query abbreviate aggregates
that are in reality SQL subqueries. For example, COUNT(P.*)
= 3, abbreviates (SELECT COUNT(*) FROM P) = 3. Using subqueries, PaQL can express arbitrarily complex global constraints among aggregates over a package.
Objective clause. The objective clause specifies a ranking
among candidate package results and appears with either
the MINIMIZE or MAXIMIZE keyword. It is a condition on the
package-level, and hence it is specified over the package
result P, for example, MINIMIZE SUM(
P.sat_fat). Similar to
global predicates, this form is a shorthand for MINIMIZE
(SELECT SUM(sat_fat) FROM P). A PaQL query with an objective clause returns a single result: the package that optimizes
the value of the objective. The evaluation methods that we
present in this work focus on such queries. In prior work,
6
we described preliminary techniques for returning multiple
packages in the absence of optimization objectives, but a
thorough study of such methods is left to future work.
Expressiveness and complexity. PaQL can express general ILP, which means that evaluation of package queries is
NP-complete.
4, 5 As a first step in package evaluation, we proceed to show how a PaQL query can be transformed into a
linear program and solved using general ILP solvers.
3. ILP FORMULATION
In this section, we present an ILP formulation for package
queries, which is at the core of our evaluation methods
Direct and SketchRefine. The results in this section are
inspired by the translation rules employed by Tiresias15 to
answer how-to queries.
3. 1. PaQL to ILP translation
Let R indicate the input relation of the package query, n = |R|
be the number of tuples in R,
R.attr an attribute of R, P a package, f a linear aggregate function (such as COUNT and SUM),
∈ {≤,≥} a constraint inequality, and v ∈ R a constant. For
each tuple ti from R, 1 ≤ i ≤ n, the ILP problem includes a
nonnegative integer variable xi, xi ≥ 0, indicating the number
of times ti is included in an answer package. We also use
to denote the vector of all integer variables.
A PaQL query is formulated as an ILP problem using the following translation rules.
Repetition constraint. The REPEAT keyword, expressible
in the FROM clause, restricts the domain that the variables
only contain references to package_name, which specifies
the name of the package result.
Basic package query. The new keyword PACKAGE differentiates PaQL from traditional SQL queries.
1: SELECT 2: SELECT PACKAGE(*) AS P
FROM Recipes R FROM Recipes R
The semantics of
1 and
2 are fundamentally different:
1 is
a traditional SQL query, with a unique, finite result set (the
entire Recipes table), whereas there are infinitely many packages that satisfy the package query 2: all possible multisets of
tuples from the input relation. The result of a package query
like
2 is a set of packages. Each package resembles a relational
table containing a collection of tuples (with possible repetitions) from relation Recipes, and therefore a package result of
2 follows the schema of Recipes. Similar to SQL, the PaQL syntax allows the specification of the output schema in the SELECT
clause. For example, PACKAGE(sat_fat, kcal) only returns the
saturated fat and calorie attributes of the package.
Although semantically valid, a query like
2 would not
occur in practice, as most application scenarios expect few,
or even exactly one result. We proceed to describe the additional constraints in the example query (Figure 2) that
restrict the number of package results.
Repetition constraints. The REPEAT 0 statement in
query from Figure 2 specifies that each tuple from the
input relation Recipe can appear in a package result at
most once (no repetitions are allowed). If this restriction is
absent (as in query
2), the multiplicity of a tuple is
unbounded. By allowing no repetitions, restricts the
package space from infinite to 2n, where n is the size of the
input relation. Generalizing, REPEAT ρ allows a package to
repeat tuples up to ρ times, resulting in ( 2 + ρ)n candidate
packages.
Base and global predicates. A package query defines two
types of predicates. A base predicate, defined in the WHERE
clause, is equivalent to a selection predicate and can be evaluated with standard SQL: any tuple in the package needs to
individually satisfy the base predicate. For example, query
from Figure 2 specifies the base predicate:
R.gluten = ‘free’.
Since base predicates directly filter input tuples, they are
specified over the input relation R. Global predicates are the
core of package queries, and they appear in the new SUCH
THAT clause. Global predicates are higher-order than base
predicates: they cannot be evaluated on individual tuples,
but on tuple collections. Since they describe package-level
SELECT PACKAGE (∗|column_name [, . . .]) [AS] package_name
FROM relation_name [AS] relation_alias
[REPEAT repeat] [,...]
[WHERE w_expression ]
[SUCH THAT st_expression ]
[ (MINIMIZE|MAXIMIZE) obj_expression ]
PACKAGE (∗) AS P
Recipes R REPEAT 0 FROM
WHERE R.gluten = ‘free’
SUCH THAT COUNT (P.∗) =
3 AND
SUM(P.kcal) BETWEEN
2.0 AND
2. 5
MINIMIZE SUM(
P.sat_fat)
PaQL query for Example 1 PaQL syntax specification
: SELECT
Figure 2. Specification of the PaQL syntax (left), and the PaQL query for Example 1 (right).