Scalable Computation of High-
Order Optimization Queries
By Matteo Brucato, Azza Abouzied, and Alexandra Meliou
Constrained optimization problems are at the heart of
significant applications in a broad range of domains, including finance, transportation, manufacturing, and healthcare.
Modeling and solving these problems has relied on application-specific solutions, which are often complex, error-prone, and do not generalize. Our goal is to create a
domain-independent, declarative approach, supported and
powered by the system where the data relevant to these
problems typically resides: the database. We present a complete system that supports package queries, a new query
model that extends traditional database queries to handle
complex constraints and preferences over answer sets,
allowing the declarative specification and efficient evaluation of a significant class of constrained optimization problems—integer linear programs (ILP)—within a database.
Traditional database queries follow a simple model: they
define constraints, in the form of selection predicates, that
each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate each
tuple individually to determine whether it satisfies the query
conditions. However, many practical, real-world problems
require a collection of result tuples to satisfy constraints collectively, rather than individually.
Example 1 (Meal planner). A dietitian needs to design a
daily meal plan for a patient. She wants a set of three gluten-free meals, between 2000 and 2500 calories in total, and with a
low total intake of saturated fats.
Similar scenarios, requiring complex, high-order constraints arise frequently, and in many practical settings.
A broad set of domains have applications that boil down to
modeling and solving constrained optimization problems,
for example, coordinating fleet and crew assignments in airline scheduling to reduce delays and costs, 19 managing
delinquent consumer credit to minimize losses, 14 optimizing
organ transplant allocation and acceptance, 1 and planning
of cancer radiotherapy treatments. 20, 21 A significant class of
constrained optimization problems are integer linear programs (ILP). ILP solutions alone account for billions in US
dollars of projected benefits within each of these and other
industry sectors. 7
Modeling and solving these problems has relied on
application-specific solutions, 2, 9, 13, 17, 23, 18 which can often
be complex and error-prone, and fail to generalize. Our goal
is to create a domain-independent, declarative approach,
supported and powered by the system where the data
The original version of this paper is entitled "Scalable
Package Queries in Relational Database Systems” and
was published in the Proceedings of the VLDB Endowment,
Vol. 9, No. 7 (2016), 576–587.
relevant to these problems typically resides: the database.
We present a complete system that supports package queries, a new query model that extends traditional database
queries to handle complex constraints and preferences
over answer sets, allowing the declarative specification and
efficient evaluation of a significant class of constrained
optimization problems—ILP—within a database. Package
queries are defined over traditional relations, but return
packages. A package is a collection of tuples that (a) individually satisfy base predicates (traditional selection predicates), and (b) collectively satisfy global predicates
(package-specific predicates). Package queries are combinatorial in nature: the result of a package query is a (
potentially infinite) set of packages, and an objective criterion can
define a preference ranking among them.
Extending traditional database functionality to provide
support for packages, rather than supporting packages at
the application level, is justified by two reasons: First, the
features of packages and the algorithms for constructing
them are not unique to each application; therefore, the burden of package support should be lifted off application
developers, and database systems should support package
queries like traditional queries. Second, the data used to
construct packages typically reside in a database system,
and packages themselves are structured data objects that
should naturally be stored in and manipulated by a database system.
Our work addresses three important challenges. The first
challenge is to support declarative specification of packages.
SQL enables the declarative specification of properties that
result tuples should satisfy. In Example 1, it is easy to specify
the exclusion of meals with gluten using a regular selection
predicate in SQL. However, it is difficult to specify global constraints (e.g., total calories of a set of meals should be between
2000 and 2500 calories). Expressing such a query in SQL
requires either complex self-joins that explode the size of the
query, or recursion, which results in extremely complex queries that are hard to specify and optimize. Our goal is to maintain the declarative power of SQL, while extending its
expressiveness to allow for the easy specification of packages.
The second challenge relates to the evaluation of package queries. Due to their combinatorial complexity, package queries are harder to evaluate than traditional
database queries. 10 Package queries are in fact as hard as
ILP. 5 Existing database technology is ineffective at