DirectCustomer
cust-id name
c1 UCSC
DirectOrder
cust-id date
c1 05-01-2009
c1 05-01-2009
Retail
store-id quant
s1 1
Source database schema S: Target database schema T:
Sales(date, cust, prod, quant) DirectCustomer(cust-id, name, address)
DirectOrder(cust-id, date, prod, quant)
Retail(store-id, date, prod, quant)
figure 1. An example of a schema mapping.
address
1156 High St, Santa Cruz, CA 95060
prod
Quadcore-9950-PC
TFT-933SN-Wide
quant
100
100
date
05-03-2009
prod
Quadcore-9950-PC
A target database instance J1
Sales
date
05-01-2009
05-01-2009
05-03-2009
cust
UCSC
UCSC
N1
prod
Quadcore-9950-PC
TFT-933SN-Wide
Quadcore-9950-PC
quant
100
100
1
A second target database instance J2
Sales
date
05-01-2009
05-01-2009
05-03-2009
cust
UCSC
UCSC
UCLA
prod
Quadcore-9950-PC
TFT-933SN-Wide
Quadcore-9950-PC
quant
100
100
1
A third target database instance J3
Sales
date cust
05-01-2009 UCSC
05-03-2009 N1
Schema mapping
∀x,y,z,u,v,w (DirectCustomer(x,y,z) ∧ DirectOrder(x,u,v,w)
→ Sales(u, y, v, w))
∀x,y,z,v,w (Retail(x, y, v, w) → ∃N Sales(y, N, v, w))
prod
Quadcore-9950-PC
Quadcore-9950-PC
quant
100
1
figure 2. Data exchange and data integration.
Schema S1
Schema S1
Query Q
Schema S2
Global schema T
Schema S2
Global schema T
Global instance
to be materialized
Schema S3
Schema S3
(a) Data exchange
of abstraction that makes it possible to separate the specification of the relationship between the schemas from the
actual implementation of the transformations. Schema
mappings are declarative specifications that describe the
relationship between two database schemas. In recent years,
schema mappings have been used extensively in specifying
data interoperability tasks and are regarded as the essential
building blocks in data exchange and data integration (see,
e.g., the surveys14, 15). The use of schema mappings helps the
user understand and reason about the relationship between
the source schemas and the target schema; furthermore,
schema mappings can be automatically compiled into executable scripts in various languages.
A concrete example of a schema mapping is given in
Figure 1. The schema mapping is specified by two sentences
of first-order logic. The first sentence asserts that whenever
the source relation DirectCustomer contains a triple (v1, v2, v3)
of values and the source relation DirectOrder contains a
quadruple (v1, v4, v5, v6) of values so that the values for cust-id
in these two tuples coincide, then the target relation Sales
must contain the quadruple (v1, v2, v4, v5). The second sentence asserts that whenever the source relation Retail contains a quadruple (w1, w2, w3, w4) of values, then there must
exist some value V so that the target relation Sales contains
the quadruple (w2, V, w3, w4). Clearly, the pair (I, J1) consisting
of the source relation I and the target relation J1 satisfies both
these formulas; the same holds true for the pair (I, J2). In contrast, the pair (I, J3) fails to satisfy the first formula because
the Sales relation does not contain the required quadruple
(05-01-2009, UCSC, TFT-933SN-Wide, 100).
This example also unveils some of the main conceptual
and algorithmic issues arising in data exchange and data
integration. On the data exchange side, suppose that we
wish to transform the above source instance I to a target
instance J according to the schema mapping in Figure 1.
There are at least two distinct target instances that,
together the source instance I, satisfy the specification of
the schema mapping; in fact, it is easy to see that there are
infinitely many such target instances. So, which one should
we choose to materialize? What makes one target instance
a “better” candidate to materialize than another, and how
can one be computed? As we shall see, universal solutions
turn out to be the preferred target instances to materialize. On the data integration side, suppose that a user poses
a query over the target schema. Different answers may
be obtained by evaluating the query on different target
instances that (together with the given source instance)
satisfy the schema-mapping specification. So, what is the
“right” semantics of target queries in data integration? Is
it possible to rewrite target queries into queries over the
source schema so that they can be evaluated directly against
the given source instance? This will lead us to the notions
of the certain answers and of allowing for query rewriting.
In the next section, we will introduce some of the most
commonly used schema-mapping languages, including
global-as-view (GAV) dependencies and local-as-view (LAV)
dependencies, and we will find out that schema mappings
specified in these formalisms indeed admit universal solutions and allow for rewriting of the most frequently asked