{ns, pid} pair to the corresponding cpu time. While the
left path from x to w is implemented using a hash table
of hash tables, the right path is a vector with two entries,
one pointing to a list of running processes, the other to
a list of sleeping processes. Because node w is shared,
there is only one physical copy of each cpu value, shared
by the two access paths.
We now describe the three decomposition primitives.
•;A unit decomposition, depicted as a node labeled with a
set of columns C, represents a single tuple t with
columns C. A unit node cannot have any outgoing
edges. For example, in Figure 2(a) node w is a unit
decomposition containing a single cpu value.
•;A map decomposition, depicted as an edge labeled with
a set of columns C, represents a relation as a mapping
{t vt,…} from a set of columns C, called key columns,
to a set of residual relations rt, one for each valuation t of
the key columns. Each residual relation rt is in turn represented by decomposition v. For example, in Figure 2(a)
the edge from y to w labeled pid indicates that for each
instance of vertex y in a decomposition instance there
is a data structure that maps each value of pid to a different residual relation, represented using the decomposition rooted at w. Any container that implements a
key-value associative map interface can be used to
implement the map; the example uses unordered doubly linked lists of key-value pairs, hash tables, and
arrays mapping keys to values. The set of containers is
extensible; any container implementing a common
interface may be used.
•;Multiple edges that exit the same node denote a join,
which represents a relation as the natural join of two
sub-relations r1 and r2. Each sub-relation has its own
sub-decomposition, allowing a join to represent the
same data but with different cost models. In diagrams,
join decompositions exist wherever multiple map
edges exit the same node. For example, in Figure 2(a)
node x has two outgoing edges and hence is the join of
two map decompositions.
one valuation for the empty set of columns 0/, there is exactly
one instance of variable x in any instance of the decomposition; further the subgraph of a decomposition instance
rooted at x represents a relation with all columns. Similarly,
node y has type {ns} {pid, cpu}, implying that there is a
distinct instance of node y for each value of the ns field in
the relation, and that the subgraph rooted at each instance
of y represents a relation with columns pid, cpu. Node z has
type {state} {ns, pid, cpu}. Finally node w has type {ns, pid,
state} {cpu}.
The structure of a decomposition instance parallels the
structure of the decomposition; for each node v: B C in the
decomposition the instance contains a corresponding set of
node instances {vt, vt,…}, each for different valuations of columns B. For example, in Figure 2, decomposition node z has
two different instances zstate: S and zstate: R, one for each state
value in the relation.
3. 2. adequacy of decompositions
We define an abstraction function a, defined formally in the
full paper, 11 which maps decomposition instances to the
relation they represent. The abstraction function is defined
recursively on the structure of the decomposition instance.
Informally, a unit node represents its associated tuple t,
a map decomposition represents the union of t joined with
a(vt) for each edge t vt that comprises the map, and a
join decomposition represents the natural join of all of its
subdecompositions.
Not every relation can be represented by every decom-
position. In general a decomposition can only represent
relations with specific columns satisfying certain func-
tional dependencies. For example the decomposition dˆ in
Figure 2(a) cannot represent the relation
( 2)
A decomposition instance, or instance for short, is a rooted,
directed acyclic graph representing a particular relation.
Each node of a decomposition corresponds to a set of nodes
in an instance of that decomposition. Figure 2(b) shows an
instance of the decomposition representing the relation rs
defined in Equation ( 1). The structure of an instance corresponds to a low-level memory state; nodes are objects in
memory and edges are data structures navigating between
objects. For example, node zstate: S has two outgoing edges,
one for each sleeping process; the dashed edge indicates
that the collection of sleeping processes is implemented as
a doubly linked list.
Each decomposition node has an associated “type,” con-
sisting of a pair of column sets B C; every instance of node
v in a decomposition instance has a distinct valuation of col-
umns B, and each such instance represents a relation with
columns C. In the decomposition of Figure 2(a), the root
node x has type 0/ {ns, pid, cpu, state}. Since there is only
since for each pair of ns and pid values the decomposition dˆ
can only represent a single value for the state and cpu fields.
However, r does not correspond to a meaningful set of pro-
cesses—the relational specification in Section 2 requires
that all well-formed sets of processes satisfy the functional
dependency ns, pid → state, cpu, which allows at most one
state or cpu value for any given process.
4. QueRies anD uPDates
In Section 3 we introduced decompositions that describe how
to represent a relation in memory as a collection of data structures. In this section we show how to compile the relational