Massively Parallel Construction of Radix Tree Forests for the Efficient Sampling of Discrete Probability Distributions
We compare different methods for sampling from discrete probability distributions and introduce a new algorithm which is especially efficient on massively parallel processors, such as GPUs. The scheme preserves the distribution properties of the input sequence, exposes constant time complexity on the average, and significantly lowers the average number of operations for certain distributions when sampling is performed in a parallel algorithm that requires synchronization afterwards. Avoiding load balancing issues of naïve approaches, a very efficient massively parallel construction algorithm for the required auxiliary data structure is complemented.