High-Quality Antialiased Rasterization

Finely detailed 3D geometry can show significant aliasing artifacts if rendered using native hardware multisampling, because multisampling is currently limited to one-pixel box filtering and low sampling rates. This chapter describes a tiled supersampling technique for rendering images of arbitrary resolution with arbitrarily wide user-defined filters and high sampling rates. The code presented here is used in the Gelato film renderer to produce images of uncompromising quality using the GPU.

GPU-Accelerated High Quality Hidden Surface Removal

High-quality off-line rendering requires many features not natively supported by current commodity graphics hardware: wide smooth filters, high sampling rates, order-independent transparency, spectral opacity, motion blur, depth of field. We present a GPU-based hidden-surface algorithm that implements all these features. The algorithm is Reyes-like but uses regular sampling and multiple passes. Transparency is implemented by depth peeling, made more efficient by opacity thresholding and a new method called \emph{z batches}. We discuss performance and some design trade-offs.

Efficient Rendering of Human Skin

Existing offline techniques for modeling subsurface scattering effects in multi-layered translucent materials such as human skin achieve remarkable realism, but require seconds or minutes to generate an image. We demonstrate rendering of multi-layer skin that achieves similar visual quality but runs orders of magnitude faster. We show that sums of Gaussians provide an accurate approximation of translucent layer diffusion profiles, and use this observation to build a novel skin rendering algorithm based on texture space diffusion and translucent shadow maps.

Stochastic Transparency

Stochastic transparency provides a unified approach to order-independent transparency, anti-aliasing, and deep shadow maps. It augments screen-door transparency using a random sub-pixel stipple pattern, where each fragment of transparent geometry covers a random subset of pixel samples of size proportional to alpha. This results in correct alpha-blended colors on average, in a single render pass with fixed memory size and no sorting, but introduces noise. We reduce this noise by an alpha correction pass, and by an accumulation pass that uses a stochastic shadow map from the camera.

Hardware-Accelerated Global Illumination by Image Space Photon Mapping

We describe an extension to photon mapping that recasts the most expensive steps of the algorithm -- the initial and final photon bounces -- as image-space operations amenable to GPU acceleration. This enables global illumination for real-time applications as well as accelerating it for offline rendering. Image Space Photon Mapping (ISPM) rasterizes a light-space bounce map of emitted photons surviving initial-bounce Russian roulette sampling on a GPU. It then traces photons conventionally on the CPU.

Iterative Methods for Improving Mesh Parameterizations

We present two complementary methods for automatically improving mesh parameterizations and demonstrate that they provide a very desirable combination of efficiency and quality. First, we describe a new iterative method for constructing quasi-conformal parameterizations with free boundaries. We formulate the problem as fitting the coordinate gradients to two guidance vector fields of equal magnitude that are everywhere orthogonal. In only one linear step, our method efficiently generates parameterizations with natural boundaries from those with convex boundaries.

Free-form Motion Processing

Motion is the center of attention in many applications of computer graphics. Skeletal motion for articulated characters can be processed and altered in a great variety of ways to increase the versatility of each motion clip. However, analogous techniques have not yet been developed for free-form deforming surfaces like cloth and faces. Given the time consuming nature of producing each free-form motion clip, the ability to alter and reuse free-form motion would be very desirable.

Scalable Parallel Programming with CUDA

The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore’s law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

Parallel Computing Experiences with CUDA

The CUDA programming model provides a straightforward means of describing inherently parallel computations, and NVIDIA’s Tesla GPU architecture delivers high computational throughput on massively parallel problems. This article surveys experiences gained in applying CUDA to a diverse set of problems and the parallel speedups over sequential codes running on traditional CPU architectures attained by executing key computations on the GPU.

Rapid Multipole Graph Drawing on the GPU

As graphics processors become powerful, ubiquitous and easier to program, they have also become more amenable to general purpose high-performance computing, including the computationally expensive task of drawing large graphs. This paper describes a new parallel analysis of the multipole method of graph drawing to support its efficient GPU implementation. We use a variation of the Fast Multipole Method to estimate the long distance repulsive forces in force directed layout. We support these multipole computations efficiently with a k-d tree constructed and traversed on the GPU.