Hardware-Accelerated Global Illumination by Image Space Photon Mapping

We describe an extension to photon mapping that recasts the most expensive steps of the algorithm -- the initial and final photon bounces -- as image-space operations amenable to GPU acceleration. This enables global illumination for real-time applications as well as accelerating it for offline rendering. Image Space Photon Mapping (ISPM) rasterizes a light-space bounce map of emitted photons surviving initial-bounce Russian roulette sampling on a GPU. It then traces photons conventionally on the CPU.

Iterative Methods for Improving Mesh Parameterizations

We present two complementary methods for automatically improving mesh parameterizations and demonstrate that they provide a very desirable combination of efficiency and quality. First, we describe a new iterative method for constructing quasi-conformal parameterizations with free boundaries. We formulate the problem as fitting the coordinate gradients to two guidance vector fields of equal magnitude that are everywhere orthogonal. In only one linear step, our method efficiently generates parameterizations with natural boundaries from those with convex boundaries.

Free-form Motion Processing

Motion is the center of attention in many applications of computer graphics. Skeletal motion for articulated characters can be processed and altered in a great variety of ways to increase the versatility of each motion clip. However, analogous techniques have not yet been developed for free-form deforming surfaces like cloth and faces. Given the time consuming nature of producing each free-form motion clip, the ability to alter and reuse free-form motion would be very desirable.

Scalable Parallel Programming with CUDA

The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore’s law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

Parallel Computing Experiences with CUDA

The CUDA programming model provides a straightforward means of describing inherently parallel computations, and NVIDIA’s Tesla GPU architecture delivers high computational throughput on massively parallel problems. This article surveys experiences gained in applying CUDA to a diverse set of problems and the parallel speedups over sequential codes running on traditional CPU architectures attained by executing key computations on the GPU.

Rapid Multipole Graph Drawing on the GPU

As graphics processors become powerful, ubiquitous and easier to program, they have also become more amenable to general purpose high-performance computing, including the computationally expensive task of drawing large graphs. This paper describes a new parallel analysis of the multipole method of graph drawing to support its efficient GPU implementation. We use a variation of the Fast Multipole Method to estimate the long distance repulsive forces in force directed layout. We support these multipole computations efficiently with a k-d tree constructed and traversed on the GPU.

On the Visualization of Social and Other Scale-Free Networks

This paper proposes novel methods for visualizing specifically the large power-law graphs that arise in sociology and the sciences. In such cases a large portion of edges can be shown to be less important and removed while preserving component connectedness and other features (e.g., cliques) to more clearly reveal the network’s underlying connection pathways. This simplification approach deterministically filters (instead of clustering) the graph to retain important node and edge semantics, and works both automatically and interactively.

Efficient Sparse Matrix-Vector Multiplication on CUDA

The massive parallelism of graphics processing units (GPUs) offers tremendous performance in many high-performance computing applications. While dense linear algebra readily maps to such platforms, harnessing this potential for sparse matrix computations presents additional challenges. Given its role in iterative methods for solving sparse linear systems and eigenvalue problems, sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra.

Efficient Parallel Scan Algorithms for GPUs

Scan and segmented scan algorithms are crucial building blocks for a great many data-parallel algorithms. Segmented scan and related primitives also provide the necessary support for the flattening transform, which allows for nested data-parallel programs to be compiled into flat data-parallel languages. In this paper, we describe the design of efficient scan and segmented scan parallel primitives in CUDA for execution on GPUs. Our algorithms are designed using a divide-and-conquer approach that builds all scan primitives on top of a set of primitive intra-warp scan routines.

Fast BVH Construction on GPUs

We present two novel parallel algorithms for rapidly constructing bounding volume hierarchies on manycore GPUs. The first uses a linear ordering derived from spatial Morton codes to build hierarchies extremely quickly and with high parallel scalability. The second is a top-down approach that uses the surface area heuristic (SAH) to build hierarchies optimized for fast ray tracing. Both algorithms are combined into a hybrid algorithm that removes existing bottlenecks in the algorithm for GPU construction performance and scalability leading to significantly decreased build time.