which the pointer structure of a container is threaded
through the elements of the container with no additional
indirection. Since the C++ compiler expands templates,
the time and space overheads introduced by the wrappers are minimal.
6. 1. microbenchmarks
We implemented a selection of small benchmarks; here we
focus on just one based on directed graphs.
The graph benchmark reads in a directed weighted
graph from a text file and measures the times to construct the edge relation, to perform forward and backward depth-first searches over the whole graph, and to
remove each edge one-by-one. We represent the edges
of a directed graph as a relation edges with columns
{src, dst, weight} and a functional dependency src, dst →
weight. We represent the set of the graph nodes as a relation nodes consisting of a single id column. The RelC
compiler emits a C++ module that implements classes
nodes::relation and edges::relation with methods corresponding to each relational operation. A typical
client of the relational interface is the algorithm to perform a depth-first search:
edges::relation graph_edges;
nodes::relation visited;
// Code to populate graph_edges elided.
stack<int> stk;
stk.push(v0);
while (!stk.empty()) {
int v = stk.top();
stk.pop();
if (!visited.query(nodes::tuple_id(v))) {
visited.insert(nodes::tuple_id(v));
edges::query_iterator_src_ _dst_weight it;
graph_edges.query(edges::tuple_src(v), it);
while ( !it.finished()) {
stk.push(it.output.f_dst());
it.next();
}
}
}
The code makes use of the standard STL stack class in
addition to an instance of the nodes relation visited and
an instance of the edges relation graph_edges.
To demonstrate the tradeoffs involved in the choice
of decomposition, we used the auto-tuner framework to
evaluate three variants of the graph benchmark under
different decompositions. We used a single input graph
representing the road network of the northwestern USA,
containing 1,207,945 nodes and 2,840,208 edges. We
used three variants of the graph benchmark: a forward
depth-first search (DFS); a forward DFS and a backward
DFS; and finally a forward DFS, a backward DFS, and deletion of all edges one at a time. We measured the elapsed
time for each benchmark variant for the 84 decompositions that contain at most 4 map edges (as generated by
the auto-tuner).
Timing results for decompositions that completed within an 8s time limit are shown in Figure 4.
Decompositions that are isomorphic up to the choice of
data structures for the map edges are counted as a single decomposition; only the best timing result is shown
for each set of isomorphic decompositions. There are 68
decompositions not shown that did not complete any of
the benchmarks within the time limit. Since the auto-tuner exhaustively enumerates all possible decompositions, naturally only a few of the resulting decompositions
are suitable for the access patterns of this particular
benchmark; for example, a decomposition that indexes
edges by their weights performs poorly.
Figure 5 shows three representative decompositions
from those shown in Figure 4 with different performance
characteristics. Decomposition 1 is the most efficient for
forward traversal; however, it performs terribly for backward traversal since it takes quadratic time to compute
predecessors. Decompositions 5 and 9 are slightly less
efficient for forward traversal, but are also efficient for
figure 4. elapsed times for directed graph benchmarks for
decompositions up to size 4 with identical input. for each
decomposition we show the times to traverse the graph forward (f),
to traverse both forward and backward (f+b), and to traverse
forward, backward, and delete each edge (f+b+D). We elide 68
decompositions that did not finish a benchmark within 8s.
8
7
Elapsed time (s)
6
5
4
3
2
1
0
F
F+B
F+B+D
12345678910111213141516
Decompositions, ranked by F benchmark time
figure 5. Decompositions 1, 5, and 9 from figure 4. solid edges
represent instances of boost::intrusive::map, dotted edges
represent instances of boost::intrusive::list.
( 1)
( 5)
( 9)
x
y
z
weight
src
dst
x
x
y
src
dst
z
dst
src
w
weight
y
l
weight
src
dst
z
r
weight
dst
src