The solution of large sparse linear systems arises in many applications, such as computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are often so large that they require large scale distributed parallel computing to obtain the solution of interest in a reasonable time. In this paper we discuss the design and implementation of the AmgX library, which provides drop-in GPU acceleration of distributed algebraic multigrid and preconditioned iterative methods. The AmgX library implements both classical and aggregation based algebraic multigrid methods with different selector and interpolation strategies, along with a variety of smoothers and preconditioners, including block-Jacobi, Gauss-Seidel, and incomplete-LU factorization. The library contains many of the standard and flexible preconditioned Krylov subspace iterative methods, which can be combined with any of the available multigrid methods or simpler preconditioners. The parallelism in the aggregation scheme exploits parallel graph matching techniques, while the smoothers and preconditioners often rely on parallel graph coloring algorithms. The algebraic multigrid algorithm implemented in the AmgX library achieves 2-5x speedup on a single GPU against a competitive implementation on the CPU. As will be shown in the numerical experiments section, both setup and solve phases scale well across multiple nodes sustaining this performance advantage.