Preconditioned Block-Iterative Methods on GPUs

An implementation of the incomplete-LU/Cholesky preconditioned block-iterative methods on the Graphics Processing Units (GPUs) using the CUDA parallel programming model is presented. In particular, we focus on the tradeoffs associated with the sparse matrix-vector multiplication with multiple vectors, sparse triangular solve with multiple right-hand-sides (rhs) as well as incomplete factorization with 0 fill-in. We use these building blocks to implement the block-CG and block-BiCGStab iterative methods for the symmetric positive definite (s.p.d.) and nonsymmetric linear systems, respectively. Also, in our numerical experiments we show that the implementation of the preconditioned block-iterative methods using the CUSPARSE library on the GPU achieves an average of 3× speedup over their MKL implementation on the CPU.

Authors

Maxim Naumov (NVIDIA)

Publication Date

Saturday, December 1, 2012

Published in

Proceedings in Applied Mathematics and Mechanics

Research Area

Algorithms and Numerical Methods

High Performance Computing

External Links

Preconditioned Block-Iterative Methods on GPUs (PDF)