“It is a daunting task
to write a program
that will perform
well on not only
today’s architectures
but also those of the
future. We believe
that PetaBricks
can give programs
the portable
performance needed
to increase their
effective lifetimes.”
merating several algorithms, and then
learn from the autotuner what performs well under what circumstances.
We hope both types of users will
find PetaBricks an indispensable tool
in their programming arsenal.
EXAMPLE PROGRAM
Figure 1 presents an example PetaBricks
program, kmeans. The program groups
the input Points into a number of clusters and writes each point’s cluster to
the output Assignments. Internally the
program uses the intermediate data
Centroids to keep track of the current
center of each cluster. The transform
header declares each of the input (from),
intermediate (through), and output (to)
data structures. The rules contained
in the body of the transform define the
various pathways to construct the Assignments data from the initial Points
data. The transform can be depicted
using the dependence graph shown in
Figure 2, which indicates the dependencies of each of the three rules.
The first two rules specify two differ-
ent ways to initialize the Centroids data
needed by the iterative kmeans solver in
the third rule. Both of these rules require
the Points input data. The third rule
specifies how to produce the output As-
signments using both the input Points
and intermediate Centroids. Note that
since the third rule depends on the out-
put of either the first or second rule, the
third rule cannot be executed until the
intermediate data structure Centroids
has been computed by one of the first
two rules. The autotuner and compiler
will make sure the program will satisfy
these dependencies when producing
tuned code.
by naming the inputs and outputs of
each rule, but unlike in a traditional
dataflow programming model, more
than one rule can be defined to output
the same data. Thus, the input dependences of a rule can be satisfied by the
output of one or more rules.
It is up to the PetaBricks compiler
and autotuner to decide (for a given architecture and input) which rule combination will most efficiently produce
the transform output while satisfying
all intermediate data dependencies.
For example, the autotuner may find
that it is preferable to use rules that
minimize the critical path of the transform on parallel architectures, while
rules with the lowest computational
complexity may fare better on sequential architectures. The example in the
following section will help further illustrate the Peta Bricks language.
While it is helpful for a PetaBricks
programmer to know beforehand what
algorithms perform best under what
circumstances, the framework can
help those who do not in a different,
and possibly more fundamental, way.
An expert may narrow the field of
algorithmic choices to those he or she
knows will perform well under a variety
of circumstances, and then leverage
PetaBricks to do the heavy lifting, tailoring their programs to their specific
machine architectures. A non-expert
with no such domain knowledge can
take advantage of PetaBricks by enu-
PE TABRICKS COMPILER
INFRASTRUCTURE
Figure 3 displays the general flow for
the compilation of a PetaBricks transform. Compilation is split into two representations. The first representation
operates at the rule level, and is similar
to a traditional high-level sequential intermediate representation. The second
representation operates at the transform level, and is responsible for managing choices and for code synthesis.
The main transform level represen-
Figure 2: In this dependency graph for kmeans example, the rules are the vertices
while each edge represents the dependencies of each rule. Each edge color corresponds to each named data dependence in the pseudocode shown in Figure 1.
Rule 1
(run n times)
Points
Centroids
Rule 2
(run once)
Rule 3
(run once)
Assignments
Input
Intermediate
Output