four properties are the only properties one needs to use
in reasoning about schema mappings specified by LAV
dependencies. On the computational side, the complex-ity-theoretic results in Section 5 quantify in precise terms
the difficulty of determining whether a schema mapping
specified in one language can also be specified in a different language.
There has also been an extensive study of schema mappings specified using languages richer than the language
of s-t tgds. Consider, for instance, schema mappings
specified by s-t tgds and target tgds, which were studied
in Fagin et al.,
7 or schema mappings specified by second-order tgds (SO tgds), which arise when composing schema
mappings specified by s-t tgds.
9 These languages are
known to be strictly more expressive than the language of
s-t tgds. Our results then predict that these languages lack
at least one of the structural properties considered here.
Indeed, schema mappings specified by s-t tgds and target
tgds are closed under target homomorphisms and admit
universal solutions, but need not allow for conjunctive
query rewriting. Likewise, schema mappings specified by
SO tgds admit universal solutions and allow for conjunctive query rewriting, but need not be closed under target
homomorphisms. It remains an open problem to establish structural characterizations for such richer languages
of dependencies. A particularly interesting question is
whether there is a natural way to characterize weakly acyclic sets of target tgds,
7 a class of target dependencies that
is of central importance in data exchange, as they guarantee termination of the chase procedure within a polynomial number of steps.
Among the properties of schema mappings considered
here, closure under target homomorphisms, admitting
universal solutions, and allowing for conjunctive query
rewriting are arguably the most fundamental ones for data
exchange and data integration. Proposition 4. 4 tells that
there are schema mappings satisfying these properties that
cannot be defined by any finite set of s-t tgds. Is there a natural extension of the language of s-t tgds that is characterized
by these three properties? In order to express transformations involving grouping and data merging, a proper extension of the language of s-t tgds, called nested s-t tgds, was
proposed in Fuxman et al.
11 In the previous version of this
paper,
17 we showed that schema mappings specified by a
finite set of nested s-t tgds are closed under target homomorphisms, admit universal solutions, allow for conjunctive
query rewriting (but may not be n-modular for any n > 0). It
is an open problem to determine whether all schema mappings satisfying these properties can be defined by nested
s-t tgds.
References
1. abiteboul, s., duschka, o.M.
Complexity of answering queries
using materialized views. In
ACM Symposium on Principles of
Database Systems (PODS) (1998)
254–263.
2. avigad, J., dean, e., Mumma,
J. a formal system for euclid’s
elements. Rev. Symb. Logic (2009).
to appear.
3. bernstein, P.a., Haas, l. M.
Information integration in the
enterprise. Commun. ACM 51, 9
(2008), 72–79.
4. Codd, e.F. a relational model for large
shared data banks. Commun. ACM 13
(1970), 377–387.
5. Codd, e.F. relational completeness
of data base sublanguages. Database
Systems. r. rustin, ed. Prentice-Hall,
1972, 33–64.
6. duschka, o.M., Genesereth, M.r.
answering recursive queries using
views. ACM Symposium on Principles
of Database Systems (PODS) (1997),
109–116.
7. Fagin, r., Kolaitis, P.G., Miller, r. J.,
Popa, l. data exchange: semantics
and query answering. Theor. Comp.
Sci. 336, 1 (2005), 89–124.
8. Fagin, r., Kolaitis, P.G., Popa, l. data
exchange: getting to the core. ACM
Trans. Database Syst. 30, 1 (2005),
174–210.
9. Fagin, r., Kolaitis, P.G., Popa, l.,
tan, W.-C. Composing schema
mappings: second-order
dependencies to the rescue. ACM
Trans. Database Syst. 30, 4 (2005),
994–1055.
10. Fagin r., Vardi, M.y. the theory
of data dependencies—a survey.
Mathematics of Information
Processing, volume 34 of Proceedings
of Symposia in Applied Mathematics,
american Mathematical society,
1986, 19–71.
11. Fuxman, a., Hernandez, M.a., Ho, H.,
Miller, r.J., Papotti, P., Popa, l. nested
mappings: schema mapping reloaded.
International Conference on Very
Large Data Bases (VLDB) (2006),
67–78.
12. Haas, l. M. beauty and the
beast: the theory and practice
of information integration.
International Conference on
Database Theory (ICDT) (2007),
28–43.
13. Haas, l.M., Hernández, M.a.,
Ho, H., Popa, l., roth, M. Clio
grows up: from research prototype
to industrial t. ACM International
Conference on Management
of Data (SIGMOD) (2005),
805–810.
14. Kolaitis, P.G. schema mappings,
data exchange, and metadata
management. ACM Symposium
on Principles of Database Systems
(PODS) (2005), 61–75.
15. lenzerini, M. data integration:
a theoretical perspective. ACM
Symposium on Principles of
Database Systems (PODS) (2002),
233–246.
16. rossman, b. existential
positive types and preservation
under homomorphisms.
Symposium on Logic in
Computer Science (LICS) (2005),
467–476.
17. ten Cate, b., Kolaitis, P. G.
structural characterizations of
schema-mapping languages.
International Conference on
Database Theory (ICDT) (2009),
63–72.
18. trakhtenbrot, b. Impossibility of
an algorithm for the decision
problem on finite classes. Dokl.
Akad. Nauk. SSSR, 70 (1950),
569–572.
19. van der Meyden, r. logical
approaches to incomplete
information: a survey. Logics for
Databases and Information Systems.
Kluwer, 1998, 307–356.
Balder ten Cate ( balder.tencate@uva.nl),
InrIa and ens Cachan.
Phokion G. Kolaitis ( kolaitis@cs.ucsc.edu),
university of California, santa Cruz and
IbM research–almaden.
Acknowledgments
Research on this paper was partially supported by NSF
grants IIS-0430994 and ARRA 0905276, the Netherlands
Organization for Scientific Research (NWO) grant
639.021.508, and the ERC Advanced Grant Webdam on
Foundations of Web data management. Part of the work was
carried out during a visit by Balder ten Cate at UC Santa Cruz
and the IBM Almaden Research Center.