Mixed-Precision SVD on GPUs via Ogita–Aishima Iterative Refinement
The Ogita–Aishima iterative refinement algorithm for SVD is based on matrix–matrix multiplication, making it an attractive approach for GPUs. We extend the algorithm to complex arithmetic, and introduce robust cluster handling based on Wedin’s perturbation theory and a per-pair version of Ogita–Aishima’s convergence analysis. Our implementation refines an FP32-quality SVD computed with cuSolver’s cusolverDnXgesvdp to an FP64-quality SVD. We validate the correctness of our mixed-precision implementation by testing on a variety of singular value distributions. In all cases the accuracy is comparable to the FP64 SVD routines [D,Z]GESVD. We observe speedups of up to 5.2× over FP64-quality cusolverDnXgesvdp on the RTX PRO 6000 Blackwell Workstation Edition GPU.