tuple-generating dependencies possess desirable algorithmic properties and contain as special cases many
well-known classes of integrity constraints in databases,
such as inclusion dependencies, join dependencies, and
10 It is also interesting to note
that tuple-generating dependencies seemed to have first
appeared (at least in implicit form) a very long time ago.
Indeed, in a recent article2 presenting a formal system for
Euclid’s Elements, the authors argue convincingly that the
theorems in the Elements can be expressed using tuple-generating dependencies! Intuitively, this is so because
the typical theorem of Euclidean Geometry states that if a
certain pattern between geometric objects (points, lines,
triangles, or circles) exists, then another pattern between
geometric objects must also exist.
3. PRoPeRTies of scHemA mAPPinGs
In this section, we present several structural properties of
schema mappings, which have been widely used in the literature on data exchange and data integration. They will turn
out to play a key role in our characterizations of schema-mapping languages. Before presenting these properties,
however, we need to introduce some important notions in
data exchange and data integration.
Homomorphisms. A central notion in the study of schema
mappings is that of a homomorphism. Let K and K¢ be two instances over the same schema R = (R1, .. ., Rn). A homomorphism for K to K¢ is a function from the active domain of K
to the active domain of K¢ such that if (a1, . . . , am) Î Rk i , then
(h(a1,..., am))Î Rk¢ i, for i = 1,..., n.
A homomorphism h is said to be constant on a set X
if h restricted to X Ç dom (h) is the identity function, i.e.,
h(x) = x, if x Î X and h(x) is defined. Here, we will consider
homomorphisms between target instances and we will
require that they are constant on the active domain of some
source instance. The rationale behind this requirement is
as follows. Typically, when data is exchanged from source
to target, the elements in the active domain of a given
source instance are “known” values. The schema mapping at hand, however, may under-specify the relationship
between source and target. In turn, this may force using
“unknown” values, called labeled nulls, to materialize solutions for a given source instance. Thus, target instances
may have both “known” values, originating from the source
instance, and “unknown” values, chosen freshly and acting
as placeholders. An example of labeled null is the value N1
in the target instance J1 in Figure 1. Homomorphisms are
required to leave “known” values untouched, but are free
to replace an “unknown” value by another, “known” or
Universal Solutions. Let M = (s, t, S) be a schema mapping. Recall that if I is a source instance, then a solution
for I w.r.t. M is a target instance J such that (I, J) |= S. In
general, a source instance I may have multiple solutions
and, in fact, infinitely many; in particular, this holds true
for schema mappings M specified by s-t tgds because if J
is a solution for I w.r.t. M, then every instance J¢ contain-
ing J (as a set of facts) is also a solution for I. Which among
those solutions should one materialize if I is to be transformed into a target instance? To address this question,
the following concept of a universal solution was introduced in Fagin et al.
Let M = (S, T, W) be a schema mapping and let I be a
source instance. A target instance J is a universal solution
for I if
1. J is a solution for I.
2. For each target instance J¢ that is a solution for I,
there is a homomorphism h: J ® J¢ that is constant
on adom (I ).
The intuition behind this concept, which is illustrated
in Figure 4, is that a universal solutions is a “most general” solution in the sense that it contains no more and no
less information than that specified by the given schema
Example 3. 1. Consider the source instance I in Figure 1.
One solution for I with respect to the given schema mapping is the target instance J1. Note that N1 is a value that does
not occur in I, and therefore is interpreted as a labeled null.
Another solution for I is the target instance J2, which is the
same as J1 except that labeled null value N1 has been replaced
by UCLA. However, J2 contains information that is not
implied by the schema mapping, namely, that the customer
of the order on May 3, 2009 is UCLA, and, consequently, it is
not a universal solution. Indeed, there is a homomorphism
from J1 to J2 constant on the active domain of I (N1 is mapped
to UCLA), but not vice versa.
Queries and Certain Answers. Informally, a query is a
question that a user poses against a database. More formally, a query q takes as input an instance K and returns as
output a relation q(K) of fixed arity with values from the active domain of K. Suppose now that we have a schema mapping between a source schema and target schema. Suppose
also that a query q over the target schema is posed and that
a source instance I is given. What does answering q using
the source instance I mean? As seen earlier, there may be
infinitely many solutions J for a given source instance I; furthermore, if q is evaluated on different solutions J for I, it is
figure 4. A universal solution.