Preconditioned Block-Iterative Methods on GPUs
An implementation of the incomplete-LU/Cholesky preconditioned block-iterative methods on the Graphics Processing Units (GPUs) using the CUDA parallel programming model is presented. In particular, we focus on the tradeoffs associated with the sparse matrix-vector multiplication with multiple vectors, sparse triangular solve with multiple right-hand-sides (rhs) as well as incomplete factorization with 0 fill-in. We use these building blocks to implement the block-CG and block-BiCGStab iterative methods for the symmetric positive definite (s.p.d.) and nonsymmetric linear systems, respectively. Also, in our numerical experiments we show that the implementation of the preconditioned block-iterative methods using the CUSPARSE library on the GPU achieves an average of 3× speedup over their MKL implementation on the CPU.