code2 and a graphics application23 that performs agglomerative clustering. 17 In Section 3, we discuss the Galois programming model and runtime system for exploiting this
parallelism. In Section 4, we evaluate the performance of
our system on the two applications. Finally, in Section 5, we
discuss conclusions and ongoing work.
2. TWo iRReGuLaR aPPLica TioNS
In this section, we describe opportunities for exploiting
parallelism in two irregular applications: Delaunay mesh
refinement, 2 and agglomerative clustering17 as used within
a graphics application. 23 These applications perform refinement and coarsening, respectively, which are arguably
the two most common operations for bulk modification of
irregular data structures.
2. 1. Delaunay mesh refinement
The input to the 2D Delaunay mesh refinement algorithm is a
triangulation of some region in the plane, in which all triangles
satisfy a certain geometric property called the Delaunay condition. 2 Some of these triangles may be badly shaped according to certain geometric criteria; for example, excessively
large triangles may cause unacceptable discretization errors
in finite-element solutions. The goal of mesh refinement is
to eliminate these badly shaped triangles from the mesh by
replacing them with smaller triangles. However, performing
this operation on a bad triangle may violate the Delauanay
condition for neighboring triangles, so it is necessary to find
all affected triangles (this is called the cavity of that bad triangle), and retriangulate the entire cavity. Figure 1 shows the
initial mesh on the left (badly shaped triangles are colored
black, and cavities are colored gray), and the refined mesh on
the right. Refinement may create new badly shaped triangles,
but there is a mathematical guarantee, at least in 2D, that if
this process is repeated, a mesh without bad triangles will
be produced in the end. The structure of the final mesh may
depend on the order in which bad triangles are eliminated,
but any mesh produced by this process is acceptable.
Figure 2 shows the pseudocode for mesh refinement. It is
natural to organize the program around a work-list containing the bad triangles, as seen in lines 3 and 4. This work-list
is one of the two key data structures in mesh refinement.
The other is a graph representing the mesh structure; each
triangle in the mesh is represented as one node, and edges
in the graph represent triangle adjacencies in the mesh.
figure 1. fixing bad elements.
(a) Unrefined Mesh
(b) Refined Mesh
90 commuNicaTioNS of The acm | sEPTEMBER 2009 | VoL. 52 | no. 9
figure 2. Pseudocode of the mesh refinement algorithm.
1: Mesh m = /* read in initial mesh */
2: WorkList wl ;
3: wl.add ( mesh.badTriangles ());
4: while ( wl.size () != 0) {
5: Element e = wl.get (); //get bad triangle
6: if (e no longer in mesh) continue;
7: Cavity c = new Cavity (e);
8: c.expand ();
9: c.retriangulate ();
10: mesh.update (c);
11: wl.add ( c.badTriangles ());
12: }
opportunities for Exploiting parallelism: Delaunay mesh
refinement is a relatively complicated code since the central
data structure is a graph that is modified repeatedly during
the execution of the algorithm. Nevertheless, there may be a
lot of parallelism in the algorithm since cavities that do not
overlap can be processed in parallel, as in the mesh of Figure 1.
If two cavities overlap, they must be processed sequentially
in some order. How much parallelism is there in mesh refinement? Our studies11 have shown that for a mesh of 100,000
triangles in which roughly half the initial triangles are badly
shaped, there are more than 256 triangles that can be processed in parallel until almost the end of execution.
2. 2. agglomerative clustering
The second problem is agglomerative clustering, a well-known
data-mining algorithm. 17 This algorithm is used in graphics
applications for handling large numbers of light sources. 23
The input to the clustering algorithm is ( 1) a data-set and
( 2) a measure of the similarity between items in the data-set.
The goal of clustering is to construct a binary tree called a
dendrogram whose hierarchical structure exposes the similarity between items in the data-set. Figure 3(a) shows a data-set containing points in the plane, for which the measure of
similarity between data points is Euclidean distance. The
dendrogram for this data-set is shown in Figure 3(b) and (c).
Agglomerative clustering can be performed by an iterative
algorithm: at each step, the two closest points in the data-set are clustered together and replaced in the data-set by a
single new point that represents the new cluster. The location of this new point may be determined heuristically. 17 The
algorithm terminates when there is only one point left in the
data-set. Pseudocode for the algorithm is shown in Figure 4.
figure 3. agglomerative clustering.
e
e
a
b
d
a
b
d
c
c
abcde
(a) Data points
(b) Hierarchical clusters