116 COMMUNICATIONS OF THE ACM | FEBRUARY2019 | VOL. 62 | NO. 2
partitioning to limit the size of each partition, to study its
effects on the query response time and the approximation
ratio of SketchRefine. In all cases, along the lines of the
previous experiments, we do not enforce diameter conditions. Figure 7 show the results obtained on the Galaxy workload, using 30% of the original data. We vary t from higher
values corresponding to fewer but larger partitions, on the
left-hand size of the x-axis, to lower values, corresponding to
more but smaller partitions. When Direct is able to produce a solution, we also report its running time (horizontal
line) as a baseline for comparison.
The results show that the partition size threshold has a
major impact on the execution time of SketchRefine, with
extreme values of t (either too low or too high) often resulting
in slower running times than Direct. With bigger partitions,
on the left-hand side of the x-axis, SketchRefine takes about
the same time as Direct because both algorithms solve problems of comparable size. When the size of each partition starts
to decrease, moving from left to right on the x-axis, the
response time of SketchRefine decreases rapidly, reaching
about an order of magnitude improvement with respect to
Direct. Most of the queries show that there is a “sweet spot”
at which the response time is the lowest: when all partitions
are small, and there are not too many of them. This point is
consistent across different queries, showing that it only
depends on the input data size. After that point, although the
partitions become smaller, the number of partitions starts to
increase significantly. This increase has two negative effects: it
increases the number of representative tuples, and thus the
size and complexity of the initial Sketch query, and it
increases the number of groups that Refine may need to
refine to construct the final package. This causes the running
time of SketchRefine, on the right-hand side of the x-axis, to
increase again and reach or surpass the running time of
Direct. The mean and median approximation ratios are in all
cases very close to one, indicating that SketchRefine retains
very good quality regardless of the partition size threshold.
6. CONCLUSION AND FUTURE WORK
We introduced a complete system that supports the declarative
specification and efficient evaluation of package queries. We
presented PaQL, a declarative extension to SQL, and we developed a flexible approximation method, with strong theoretical
guarantees, for the evaluation of PaQL queries on large-scale
datasets. Our experiments on real-world data demonstrate that
our scalable evaluation strategy is effective and efficient over
varied data sizes and queries. We have further extended our
techniques and experimental evaluation and placed our
research in the context of related work.
4
Our work so far focused on deterministic package queries,
but many applications of constrained optimization require support for uncertainty: airline fleet scheduling has uncertain passenger demands, or investment portfolio optimization deals
with uncertain returns and risks, etc. We are currently working
on extending our system to support optimization of the
expected value of an objective function subject to expectation
constraints of the form E(SUM(x)) ≥ b, or probabilistic constraints of the form SUM(x) ≥ b WITH PROBABILITY ≥ 95%. The
challenge is to ensure robust optimal solutions, computed © 2019 ACM 0001-0782/19/2 $15.00
References
1. Alagoz, O., Schaefer, A.J., Roberts, M.S.
Optimizing Organ Allocation and
Acceptance. Springer, Boston, MA,
2009, 1–24.
2. Baykasoglu, A., Dereli, T., Das, S.
Project team selection using fuzzy
optimization approach. Cybern. Syst.
38, 2 (2007), 155–185.
3. Bisschop, J. AIMMS Optimization
Modeling. Paragon Decision
Technology, 2006.
4. Brucato, M., Abouzied, A., Meliou, A.
Package queries: efficient and
scalable computation of high-order
constraints. VLDB J. (Oct. 2017).
5. Brucato, M., Beltran, J.F., Abouzied,
A., Meliou, A. Scalable package
queries in relational database
systems. PVLDB 9, 7 (2016), 576–587.
6. Brucato, M., Ramakrishna, R., Abouzied,
A., Meliou, A. PackageBuilder: From
tuples to packages. PVLDB 7, 13 (2014),
1593–1596.
7. Chen, D.-S., Batson, R.G., Dang, Y.
Applied Integer Programming:
Modeling and Solution. John Wiley &
Sons, 2011.
8. Cook, W., Hartmann, M. On the
complexity of branch and cut
methods for the traveling salesman
problem. Polyhedral Comb. 1 (1990),
75–82.
9. De Choudhury, M., Feldman, M.,
Amer-Yahia, S., Golbandi, N., Lempel, R.,
Yu, C. Automatic construction of travel
itineraries using social breadcrumbs.
In Proceedings of the 21st ACM
Conference on Hypertext and
Hypermedia ( Toronto, Ontario, Canada,
June 13–16, 2010) ACM, NY, 35–44.
10. Deng, T., Fan, W., Geerts, F. On the
complexity of package
recommendation problems. In PODS
‘ 12 Proceedings of the 31st ACM
SIGMOD-SIGACT-SIGAI
Symposium on Principles of Database
Systems (Scottsdale, Arizona, USA,
May 21–23, 2012) ACM, NY, 261–272.
11. Finkel, R.A., Bentley, J.L. Quad trees
a data structure for retrieval on
composite keys. Acta Inf. 4, 1
(1974), 1–9.
12. IBM CPLEX Optimization Studio.
http://www.ibm.com/software/
commerce/optimization/cplex-optimizer/.
13. Lappas, T., Liu, K., Terzi, E. Finding a
team of experts in social networks.
In KDD '09 Proceedings of the 15th
ACM SIGKDD International
Conference on Knowledge Discovery
and Data Mining (Paris, France, June
28–July 01, 2009) ACM, NY, 467–476.
14. Makuch, W. M., Dodge, J. L., Ecker, J.G.,
Granfors, D. C., Hahn, G. J. Managing
consumer credit delinquency in the us
economy: A multi-billion dollar
management science application.
Interfaces 22, 1 (1992), 90–109.
15. Meliou, A., Suciu, D. Tiresias: The
database oracle for how-to queries.
In SIGMOD ' 12 Proceedings of the
2012 ACM SIGMOD International
Conference on Management of Data
(Scottsdale, Arizona, USA, May
20–24, 2012) ACM, NY, 337–348.
16. Padberg, M., Rinaldi, G. A branch-and-cut algorithm for the resolution of
large-scale symmetric traveling
salesman problems. SIAM Rev. 33, 1
(1991), 60–100.
17. Parameswaran, A. G., Venetis, P.,
Garcia-Molina, H. Recommendation
systems with complex constraints: A
course recommendation perspective.
ACM TOIS 29, 4 (2011), 1–33.
18. Pinel, F., Varshney, L.R. Computational
creativity for culinary recipes. In CHI
EA ‘ 14 CHI ‘ 14 Extended Abstracts on
Human Factors in Computing Systems
(Toronto, Ontario, Canada, April
26–May 01, 2014) ACM, NY, 439–442.
19. Rushmeier, R. A., Kontogiorgis, S.A.
Advances in the optimization of airline
fleet assignment. Transp. Sci. 31, 2
(1997), 159–169.
20. Sauer, O. A., Shepard, D. M., Mackie, T. R.
Application of constrained optimization
to radiotherapy planning. Med. Phys.
26, 11 (1999), 2359–2366.
21. Terrer, J. M. A., Benede, M.A. N.,
del Rio, E.B., Llanas, S.C. A feasible
application of constrained optimization
in the IMRT system. IEEE Trans.
Biomed. Eng. 54, 3 (2007), 370–379.
22. The Sloan Digital Sky Survey. http://
www.sdss.org/.
23. Wang, X., Dong, X. L., Meliou, A. In
SIGMOD ' 15 Proceedings of the 2015
ACM SIGMOD International
Conference on Management of Data
(Melbourne, Victoria, Australia, May
31–June 04, 2015) ACM, N Y,
1231–1245.
24. Williamson, D.P., Shmoys, D.B. The
Design of Approximation Algorithms.
Cambridge University Press, 2011.
efficiently, that behave well under the many possible realiza-tions of the uncertain data.
Another open problem is to efficiently handle incremental
package queries to enable user-facing, interactive constrained
optimization applications such as vacation planning. Rather
than calling the solver for each incremental query variation
from scratch, we are exploring the use of efficient database
techniques, such as top-k querying, to provide faster, albeit
approximate, solutions for interactive applications.
Acknowledgments
This research is supported by the National Science Foundation
under grants IIS-1420941, IIS-1421322, and IIS-1453543.
Matteo Brucato and Alexandra Meliou
({matteo, ameli}@ cs.umass.edu), College
of Information and Computer Sciences,
University of Massachusetts, Amherst,
MA, USA.
Azza Abouzied ( azza@nyu.edu),
Computer Science, New York University,
Abu Dhabi, UAE.