expressive power comes
great challenges.” In this
article, we have highlighted a
few recent successes in tackling these challenges, but
there remain rich opportunities for further contributions
to the field. Productive future
work areas include extending the holistic optimization
concept to new domains (for
example, machine learning), and leveraging query
and data characteristics to
deliver tighter robustness
guarantees.
References
1. Ashoke, S. and Haritsa, J. CODD: A
dataless approach to big data testing.
PVLDB 8, 12 (Aug. 2015), 2008–2011.
2. Chandra, B., Chawda, B., Kar, B., Reddy,
K., Shah, S. and Sudarshan, S. Data
generation for testing and grading SQL
queries. VLDB J. 24, 6 (Dec. 2015),
731–755.
3. Dutt, A. and Haritsa, J. Plan bouquets:
A fragrant approach to robust query
processing. ACM Trans. Database Syst.
41, 2 (June 2016), 11–1: 37.
4. Emani, K. V., and Sudarshan, S.
Cobra: A framework for cost-based
rewriting of database applications. In
Proceedings of the IEEE Intl. Conf. on
Data Engg. (Apr. 2018), 689–700.
5. Karthik, S., Haritsa, J., Kenkre, S.,
Pandit, V. and Krishnan, L. Platform-independent robust query processing.
IEEE Trans. Knowl. Data Eng. 31, 1
(Jan. 2019), 17–31.
6. Ramachandra, K., Chavan, M.,
Guravannavar, R. and Sudarshan,
S. Program transformations for
asynchronous and batched query
submission. IEEE Trans. Knowl. Data
Engg. 27, 2 (Feb. 2015), 531–544.
Jayant R. Haritsa ( haritsa@iisc.ac.in) is a
professor at the Indian Institute of Science,
Bangalore, India.
S. Sudarshan ( sudarsha@cse.iitb.ac.in)
is a professor at the Indian Institute of
Technology, Bombay, India.
© 2019 ACM 0001-0782/19/11
to speed up data access.
The transformations successfully carry out batching and asynchronous
submission of queries,
6
prefetching of query
results, and conversion of
procedural code to SQL.
A metaphorical depiction
of batching rewrites in
DBridge is shown in Figure
2, where queries that are
issued one-at-a-time, symbolized by the individual
“taxis,” are batched into
a single unified request,
carried by a “bus.” Each
transformation caters to
a restricted scope and is
therefore easy to prove correct, but in tandem they
can successfully rewrite
complex application programs. Further, the Cobra
component of DBridge4 efficiently chooses the least
cost program from many
alternative transformed
programs, by leveraging
concepts from query optimization based on algebraic equivalence rules.
Techniques for holistic
optimization of queries con-
taining imperatively coded
user-defined functions
(UDFs) were developed
jointly by IIT Hyderabad
and IIT Bombay; some of
these mechanisms have
subsequently been imple-
mented and released in
Microsoft SQL Server 2019,
garnering excellent reviews
from users.c
Collectively, the DBridge
suite of techniques brings
the powerful benefits of de-
clarative query optimization
to imperative code, open-
ing a new research frontier.
More details on these tech-
niques may be found on the
DBridge project home page.
Query and Engine Testing
With the onset of the Big
Data world, where data is
the engine driving virtu-
ally all aspects of human
endeavor, it is vitally
important to ensure both
the applications and the
underlying platforms are
functionally correct. The
XData systemd developed at
IIT Bombay supports test-
ing of SQL queries by gen-
erating datasets designed
to detect many types of
common errors.
2 XData can
be used in database courses
to help students master
the nuances of SQL query
formulation and verify their
correctness; further, the
XData system facilitates au-
tomated grading of incorrect
queries by assigning partial
markings that reflect the
severity of the errors. XData
c https://www.microsoft.com/en-us/
research/project/froid
d https://www.cse.iitb.ac.in/infolab/
xdata
is currently operational at
multiple universities.
The testing of Big Data
platforms is addressed by
the CODD projecte at IISc
Bangalore, using a distinctive metaphor of “dataless
databases.”
1 Here, databases with a desired set of
characteristics can be efficiently simulated without
explicit creation or persistent storage of the contents.
This approach is essential
since traditional testing
techniques, which involve
construction of representative databases and regression query suites, are
completely impractical at
Big Data scale due to the
time and space overheads
involved in their execution. The CODD tool has
been successfully used
for testing of database engines in the software and
telecom industries.
Future Research
An important reason for the
rapid adoption of SQL in
the 1970s was its simplicity,
which lent itself to effective
query optimization. Howev-
er, a host of complex features
have been added over the
years, and today’s query
processing world can be
paraphrased as “with great
e https://www.cse.iitb.ac.in/infolab/
xdata
P
er
for
man
c
eD
eg
ra
datio
n
108
107
106
105
104
103
102
101
100
3D_
Q15
3D_
Q
96
4
D_Q
7
4D_
Q
26
4
D_Q
2
7
4
D_Q
91
5
D_Q1
9
5D_Q
2
9
5D_
Q84
5D_
Q18
Native Optimizer Spillbound
Figure 1. Performance robustness profile.
Original
Application
Program
Database
When will
my turn come? Queue of
Queries
Optimized Database Access:
Faster, Cheaper
DBridge Program
Rewrite System
Rewritten
Application
Program
Database
Figure 2. Rewrites for optimizing data access.