query answering, become undecidable if unrestricted use
of first-order logic is allowed in specifying the relationship
between database schemas (intuitively, this is so because
such tasks involve testing first-order sentences for satisfiability, since they quantify over all solutions of a source
instance). Consequently, we have to identify proper sublanguages of first-order logic that strike a good balance between
expressive power for data interoperability purposes and
algorithmic properties. Towards this goal, let us consider
the following basic tasks that every schema-mapping language ought to support.
• Copy (nicknaming): Copy a source relation into a target
relation and rename it.
• projection (Column Deletion): Form a target relation by deleting one or more columns of a source
relation.
• Augmentation (Column Addition): Form a target relation by adding one or more columns to a source
relation.
• Decomposition: Decompose a source relation into two
or more target relations.
• Join: Form a target relation by joining two or more
source relations.
• Combinations of the above: For example, combine join
with column augmentation.
These tasks can be easily specified in first-order logic, as
shown next. For concreteness, we use relations of arities two
or three.
Copy
projection
Augmentation
Decomposition
Joint
Combinations
of the above
"x, y, z(P (x, y, z) ® T (x, y, z) )
"x, y, z(P (x, y, z) ® U(x, y) )
"x, y(R (x, y) ® z T(x, y, z) )
"x, y, z(P (x, y, z) ® U(x, y) Ù V( y, z) )
"x, y, z(R (x, z) Ù S(z, y) ® T(x, y, z) )
"x, y, z(R (x, z) Ù S(z, y) ®
$t (U (x, t) Ù W(x, y, z, t) ) )
Observe that the above formulas have a striking syntactic similarity. As a matter of fact, they all belong to a class of
first-order formulas called source-to-target tuple generating
dependencies that we define in what follows.
• If R = (R1, . . . , Rn) is a relational schema, then an atomic
formula over R is an expression of the form Ri (x), where
i £ n and x is a tuple of variables of length equal to the
arity of Ri.
• A tuple-generating dependency (tgd) is a first-order formula of the form
"x(f (x) ® yy (x, y) ),
where x and y are tuples of variables, f(x) is a conjunction of atomic formulas with variables in x, each
variable in x occurs in at least one conjunct of f(x),
and y(x, y) is a conjunction of atomic formulas with
A full tgd is a tgd with no existential quantifiers in the
right-hand side, i.e., it is of the form "x(f(x) ® y(x) ).
• A source-to-target tuple-generating dependency (s-t tgd) is
a tgd such that f(x) is a conjunction of atomic formulas
over a source schema s and y(x, y) is a conjunction of
atomic formulas over a target schema t.
Informally, s-t tgds assert that if a pattern of facts appears
in the source, then another pattern of facts must appear in
the target. They are also known as global-and-local-as-view
(GLAV) dependencies. In recent years, s-t tgds have been
studied extensively in the context of data exchange and data
integration14, 15 because, in spite of their syntactic simplicity, they can express many data interoperability tasks arising
in applications. Furthermore, s-t tgds have desirable structural properties that we will discuss in the next section. The
following two types of dependencies are important special
cases of s-t tgds.
• A GAV dependency is an s-t tgd in which the right-hand
side of the implication consists of a single atomic
formula. Thus, a GAV dependency is of the form
"x(f(x) ® U(x¢) )
with f(x) a conjunction of atomic formulas over a
source schema and U(x¢) an atomic formula over a
target schema such that the variables in x¢ are among
those in x.
• A LAV dependency is an s-t tgd in which the left-hand
side of the implication is a single atomic formula. Thus,
a LAV dependency is of the form
"x(R(x) ® yy(x, y) )
with R(x) an atomic formula over a source schema and
y (x, y) a conjunction of atomic formulas over a target
schema.
Consider again the schema mapping in Figure 1. The
first sentence used to specify that schema mapping is a GAV
dependency, while the second one is a LAV dependency. Note
also that the expressions for the Copy and the Projection
tasks are both GAV and LAV dependencies, the expressions
for the Augmentation and the Decomposition tasks are LAV
dependencies, and the expression for the Join task is a GAV
dependency. The expression for the Decomposition task
is a full tgd. It is easy to see that every full tgd is logically
equivalent to finitely many GAV dependencies. For example, the full tgd "x, y, z(P(x, y, z) ® U(x, y) Ù V( y, z) ) for the
Decomposition task is logically equivalent to the set consisting of the GAV dependencies "x, y, z(P(x, y, z) ® U(x, y) ) and
"x , y, z (P(x , y , z )®V ( y, z ) ).
Before being used in data exchange and data integration, tuple-generating dependencies had been investigated in depth in the context of dependency theory during
the 1970s and the 1980s. Dependency theory is the study
of integrity constraints in databases; in this context,