Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU
In this technical report we study different parallel graph coloring algorithms and their application to the incomplete-LU factorization. We implement graph coloring based on different heuristics and showcase their performance on the GPU. We also present a comprehensive comparison of level-scheduling and graph coloring approaches for the incomplete-LU factorization and triangular solve. We discuss their tradeoffs and differences from the mathematics and computer science prospective. Finally we present numerical experiments that showcase the performance of both algorithms. In particular, we show that incomplete-LU factorization based on graph coloring can achieve a speedup of almost 8x on the GPU over the reference MKL implementation on the CPU. Revisions: (i) a few typos in the algorithm captions have been fixed and (ii) an appendix with preconditioned iterative method experiments has been added.