A variety of parallel algorithms can be written in CUDA in a fairly straightforward manner, even when the data structures involved are not simple regular grids. SpMV (sparse matrix-vector multiplication) is a good example of an important numerical building block that can be parallelized quite directly using the abstractions provided by CUDA. The kernels we discuss here, when combined with the provided CUBLAS vector routines, make writing iterative solvers such as the conjugate gradient9 method straightforward.
A sparse n × n matrix is one in which the number of nonzero entries m is only a small fraction of the } total. Sparse matrix rep- } resentations seek to store only the nonzero elements of a matrix. Since it is fairly typical that a sparse n × n matrix will contain only m=O(n) nonzero elements, this represents a substantial savings in storage space and processing time.
One of the most common representations for general unstructured sparse matrices is the CSR (compressed sparse row) representation. The m nonzero elements of the matrix A are stored in row-major order in an array Av. A second array Aj records } the corresponding column } index for each entry of Av. Finally, an array Ap of n+ 1
float multiply_row(unsigned int rowsize,
unsigned int *Aj, // column indices for row float *Av, // non-zero entries for row float *x) // the RHS vector
{ float sum = 0;
for(unsigned int column=0; column<rowsize; ++column) sum += Av[column] * x[Aj[column]];
return sum;
}
void csrmul_serial(unsigned int *Ap, unsigned int *Aj,
float *Av, unsigned int num_rows, float *x, float *y)
{ for(unsigned int row=0; row<num_rows; ++row) {
unsigned int row_begin = Ap[row]; unsigned int row_end = Ap[row+ 1];
y[row] = multiply_row(row_end-row_begin, Aj+row_begin,
Av+row_begin, x);
__global__ void csrmul_kernel(unsigned int *Ap, unsigned int *Aj,
float *Av, unsigned int num_rows, float *x, float *y)
{ unsigned int row = blockIdx.x*blockDim.x + threadIdx.x;
if( row<num_rows )
{ unsigned int row_begin = Ap[row]; unsigned int row_end = Ap[row+ 1];
y[row] = multiply_row(row_end-row_begin, Aj+row_begin,
Av+row_begin, x);
References:
Archives