possible that different answers are obtained each time. This
ambiguity raises the conceptual problem of giving precise
semantics to query answering in data exchange and data integration. The approach taken by the research community
is to adopt the certain answers semantics, a semantics that
originated in the study of incomplete databases (see van der
Meyden19).
Let M = (s, t, S) be a schema mapping and q a query over
the target schema t. If I is source instance, then the certain
answers of q on I with respect to M, denoted certainM(q)(I), is
the set
certainM (q)(I) = {q( J): J is a solution for I w.r.t. M}.
The certain answers semantics provides the guarantee
that if a tuple t belongs to certainM(q)(I), then t belongs to
the result q( J) of q on every solution J for I. It is easy to see
that every tuple in certainM(q)(I) is a tuple of values from I.
Thus, every schema mapping M induces a (semantic) transformation certainM from queries over the target schema to
queries over the source schema, so that if q is query over the
target schema, then this transformation associates to it the
query certainM (q) over the source schema that has the same
arity as q and is defined by certainM(q)(I) = {q( J): J is a solution for I w.r.t. M}.
On the face of it, the certain answers semantics is noneffective, since evaluating certainM(q)(I) may entail computing the intersection of infinitely many sets. For many
frequently asked queries, however, efficient algorithms for
evaluating their certain answers exist.
A conjunctive query is a first-order formula of the form
$wc(x, w), where c(x, w) is a conjunction of atoms and/or
equalities. Conjunctive queries are the most frequently
asked queries in relational databases. They are also known
as project select-join (SPJ) queries because they are precisely
the queries that are expressible in relational algebra using
the selection, projection, and join operations; in particular, conjunctive queries are easily expressible in SQL using
the sElECT FRoM WHERE construct. A union of conjunctive queries is a finite disjunction of conjunctive queries;
equivalently, they are the queries expressible in relational
algebra using the selection, projection, join, and union
operations.
In information integration, the main approach to computing the certain answers is to try to rewrite queries over
the target to queries over the source. In data exchange, one
would like to take advantage of the materialized solution
and use it to obtain the certain answers of target queries.
Both approaches give rise to polynomial-time algorithms
for computing the certain answers of unions of conjunctive
queries. In particular, as shown in Fagin et al.,
7 the certain
answers of unions of conjunctive queries can be obtained
using universal solutions. Specifically, let M = (s, t, S) be a
schema mapping such that S is a finite set of s-t tgds and
let q be a union of conjunctive queries over t. If I is a source
instance and J is a universal solution for I, then certainM
(q)(I) = q( J)¯, where q( J)¯ is the result obtained by first computing q( J) and then keeping only those tuples that contain
values from the active domain of I only.
Example 3. 2. Returning to the example schema mapping
M and source instance I from Figure 1, consider the conjunctive query q over the target schema given by
q(x, y) = name,date,n,m(Sales(name,date,x,n) Ù
Sales ( name ,date, y,m ) )
It asks for all pairs of products (x, y), such that some customer bought x and y (in some quantities) on the same
date. It is not hard to see that, for every solution J of I with
respect to M, the pair (Quadcore-9950-PC, TFT-933SN-Wide)
belongs to q( J). In other words, this tuple belongs to the certain answers of q in I with respect to M. It turns out that the
certain answers of q on a source instance I are precisely the
answers to q¢(I), where q¢ is the following union of conjunctive queries over the source schema:
q¢(x,y) = ($cid1,cid2,name,addr1,addr2,date,n,m
(DirectOrder(cid1,date,x,n) Ù DirectOrder(cid2,date,y,m) Ù
DirectCustomer(cid1,name,addr1) Ù
DirectCustomer(cid2,name,addr2) ))
Ú (x = y Ù sid,date,n Retail(sid,date,x,n) )
Note that the Retail relation does not provide information
about the name of the buyer, and therefore, can only contribute identity pairs to the certain answers of q.
Alternatively, the certain answers of q on I can be computed by evaluating q on the universal solution J1 of I that
we discussed in Example 3. 1, and keeping only those tuples
that contain only values from the active domain of I.
Structural Properties of Schema Mappings. Recall that a
schema mapping M can be viewed as a syntactic object or
as a semantic one. As a syntactic object, M is given by a
triple (s, t, S), where S is a set of sentences in some logical formalism; as a semantic object, M is given by a triple (s, t, W), where W is a set of pairs (I, J) with I a source
instance and J a target instance. In what follows, whenever we write that (I, J) Î M, we mean that (I, J) |= S or that
(I, J) Î W depending on whether M is given as a syntactic
object or a semantic one.
We are now ready to present the structural properties
of schema mappings that will play a key role in our characterization results. We begin with three such properties
that have been widely used in both data exchange and data
integration.
Definition 3. 3. Let M be a schema mapping.
• Closure under target homomorphisms: We say that
M is closed under target homomorphisms if for all
(I, J) Î M and for all homomorphisms h: J ® J¢ that are
constant on adom(I), we have that (I, J¢) Î M. (see
Figure 5a).
• Admitting universal solutions: We say that M admits
universal solutions if for each source instance I there is a
universal solution J for I w.r.t. M.
• Allowing for conjunctive query rewriting: We say that
M allows for conjunctive query rewriting if for each
union q of conjunctive queries over the target schema, the