2. 1. introduction to Datalog
We first provide a short review of Datalog, following the conventions in Ramakrishnan and Ullman’s survey.
27 A Datalog
program consists of a set of declarative rules and an optional
query. Since these programs are commonly called “recursive
queries” in the database literature, we use the term “query”
and “program” interchangeably when we refer to a Datalog
program.
A Datalog rule has the form p :- q1, q2, …, qn, which can be
read informally as “q1 and q2 and … and qn implies p.” p is the
head of the rule, and q1, q2, …, qn is a list of literals that constitutes the body of the rule. Literals are either predicates over
fields (variables and constants), or functions (formally,
function symbols) applied to fields. The rules can refer to each
other in a cyclic fashion to express recursion. The order in
which the rules are presented in a program is semantically
immaterial. The commas separating the predicates in a rule
are logical conjuncts (AND); the order in which predicates
appear in a rule body also has no semantic significance,
though most implementations (including ours) employ a
left-to-right execution strategy. Predicates in the rule body
are matched (or joined) based on their common variables to
produce the output in the rule head. The query (denoted by a
reserved rule label Query) specifies the output of interest.
The predicates in the body and head of traditional
Datalog rules are relations, and we refer to them interchangeably as predicates or relations. In our work, every
relation has a primary key, which is a set of fields that
uniquely identifies each tuple within the relation. In the
absence of other information, the primary key is the full set
of fields in the relation.
By convention, the names of predicates, function symbols,
and constants begin with a lowercase letter, while variable
names begin with an uppercase letter. Most implementations
of Datalog enhance it with a limited set of side-effect-free
function calls including standard infix arithmetic and various
simple string and list manipulations (which start with “f_” in
our syntax). Aggregate constructs are represented as aggregation functions with field variables within angle brackets (áñ).
2. 2. nDLog by example
We introduce NDlog using an example program shown below
that implements the path-vector protocol, which computes
in a distributed fashion, for every node, the shortest paths
to all other nodes in a network. The path-vector protocol
is used as the base routing protocol for exchanging routes
among Internet Service Providers.
sp1 path(@Src,Dest,Path,Cost) :- link(@Src,Dest,Cost),
Path=f_init(Src,Dest).
sp2 path(@Src,Dest,Path,Cost) :- link(@Src,Nxt,Cost1),
path(@Nxt,Dest,Path2,Cost2), Cost=Cost1+Cost2,
Path=f_concatPath(Src,Path2).
sp3 spCost(@Src,Dest,min<Cost>) :- path(@Src,Dest,Path,Cost).
sp4 shortestPath(@Src,Dest,Path,Cost) :-
spCost(@Src, Dest,Cost), path(@Src,Dest,Path,Cost).
Query shortestPath(@Src,Dest,Path,Cost).
The program has four rules (which for convenience we label sp1–sp4), and takes as input a base
(extensional) relation link(Src, Dest, Cost). Rules
sp1–sp2 are used to derive “paths” in the graph, represented as tuples in the derived (intensional) relation
path(Src, Dest, Path, Cost). The Src and Dest fields
represent the source and destination endpoints of the
path, and Path is the actual path from Src to Dest. The
number and types of fields in relations are inferred from
their (consistent) use in the program’s rules.
Since network protocols are typically computations over
distributed network state, one of the important requirements of NDlog is the ability to support rules that express
distributed computations. NDlog builds upon traditional
Datalog by providing control over the storage location of
tuples explicitly in the syntax via location specifiers. Each
location specifier is a field within a predicate that dictates
the partitioning of the table. To illustrate, in the above program, each predicate has an “@” symbol prepended to a
single field denoting the location specifier. Each tuple generated is stored at the address determined by its location
specifier. For example, each path and link tuple is stored
at the address held in its first field @Src.
Rule sp1 produces path tuples directly from existing link tuples, and rule sp2 recursively produces path
tuples of increasing cost by matching (joining) the destination fields of existing links to the source fields of previously computed paths. The matching is expressed using
the repeated Nxt variable in link(Src,Nxt,Cost1) and
path(Nxt,Dest,Path2,Cost2) of rule sp2. Intuitively,
rule sp2 says that “if there is a link from node Src to node
Nxt, and there is a path from node Nxt to node Dest along
a path Path2, then there is a path Path from node Src to
node Dest where Path is computed by prepending Src
to Path2.” The matching of the common Nxt variable in
link and path corresponds to a join operation used in
relational databases.
Given the path relation, rule sp3 derives the relation
spCost(Src,Dest,Cost) by computing the minimum
cost Cost for each source and destination for all input
paths. Rule sp4 takes as input spCost and path tuples
and then finds shortestPath(Src,Dest,Path,Cost)
tuples that contain the shortest path Path from Src to
Dest with cost Cost. Last, as denoted by the Query label,
the shortestPath table is the output of interest.
2. 3. shortest path execution example
We step through an execution of the shortest-path NDlog
program above to illustrate derivation and communication of tuples as the program is computed. We make use
of the example network in Figure 1. Our discussion is necessarily informal since we have not yet presented our distributed implementation strategies; in the next section,
we show in greater detail the steps required to generate
the execution plan. Here, we focus on a high-level understanding of the data movement in the network during
query processing.
For ease of exposition, we will describe communication
in synchronized iterations, where at each iteration, each