Increasing Memory Miss Tolerance for SIMD Cores

Manycore processors with wide SIMD cores are becoming a popular choice for the next generation of throughput oriented architectures. We introduce a hardware technique called “diverge on miss” that allows SIMD cores to better tolerate memory latency for workloads with non-contiguous memory access patterns. Individual threads within a SIMD “warp” are allowed to slip behind other threads in the same warp, letting the warp continue execution even if a subset of threads are waiting on memory.

Efficient Sparse Voxel Octrees - Analysis, Extensions, and Implementation

This technical report extends our previous paper on sparse voxel octrees. We first discuss the benefits and drawbacks of voxel representations and how the storage space requirements behave for different kinds of content.

Efficient Sparse Voxel Octrees

In this paper we examine the possibilities of using voxel representations as a generic way for expressing complex and feature-rich geometry on current and future GPUs. We present in detail a compact data structure for storing voxels and an efficient algorithm for performing ray casts using this structure. We augment the voxel data with novel contour information that increases geometric resolution, allows more compact encoding of smooth surfaces, and accelerates ray casts. We also employ a novel normal compression format for storing high-precision object-space normals.

Incremental Instant Radiosity for Real-Time Indirect Illumination

We present a method for rendering single-bounce indirect illumination in real time on currently available graphics hardware. The method is based on the instant radiosity algorithm, where virtual point lights (VPLs) are generated by casting rays from the primary light source. Hardware shadow maps are then employed for determining the indirect illumination from the VPLs. Our main contribution is an algorithm for reusing the VPLs and incrementally maintaining their good distribution.

A Hardware Architecture for Surface Splatting

We present a novel architecture for hardware-accelerated rendering of point primitives. Our pipeline implements a refined version of EWA splatting, a high quality method for antialiased rendering of point sampled representations. A central feature of our design is the seamless integration of the architecture into conventional, OpenGL-like graphics pipelines so as to complement triangle-based rendering.

A Meshless Hierarchical Representation for Light Transport

We introduce a meshless hierarchical representation for solving light transport problems. Precomputed radiance transfer (PRT) and Þnite elements require a discrete representation of illumination over the scene. Non-hierarchical approaches such as per-vertex values are simple to implement, but lead to long precomputation. Hier- archical bases like wavelets lead to dramatic acceleration, but in their basic form they work well only on ßat or smooth surfaces. We introduce a hierarchical function basis induced by scattered data approximation.

High-Quality Antialiased Rasterization

Finely detailed 3D geometry can show significant aliasing artifacts if rendered using native hardware multisampling, because multisampling is currently limited to one-pixel box filtering and low sampling rates. This chapter describes a tiled supersampling technique for rendering images of arbitrary resolution with arbitrarily wide user-defined filters and high sampling rates. The code presented here is used in the Gelato film renderer to produce images of uncompromising quality using the GPU.

GPU-Accelerated High Quality Hidden Surface Removal

High-quality off-line rendering requires many features not natively supported by current commodity graphics hardware: wide smooth filters, high sampling rates, order-independent transparency, spectral opacity, motion blur, depth of field. We present a GPU-based hidden-surface algorithm that implements all these features. The algorithm is Reyes-like but uses regular sampling and multiple passes. Transparency is implemented by depth peeling, made more efficient by opacity thresholding and a new method called \emph{z batches}. We discuss performance and some design trade-offs.

Efficient Rendering of Human Skin

Existing offline techniques for modeling subsurface scattering effects in multi-layered translucent materials such as human skin achieve remarkable realism, but require seconds or minutes to generate an image. We demonstrate rendering of multi-layer skin that achieves similar visual quality but runs orders of magnitude faster. We show that sums of Gaussians provide an accurate approximation of translucent layer diffusion profiles, and use this observation to build a novel skin rendering algorithm based on texture space diffusion and translucent shadow maps.

Stochastic Transparency

Stochastic transparency provides a unified approach to order-independent transparency, anti-aliasing, and deep shadow maps. It augments screen-door transparency using a random sub-pixel stipple pattern, where each fragment of transparent geometry covers a random subset of pixel samples of size proportional to alpha. This results in correct alpha-blended colors on average, in a single render pass with fixed memory size and no sorting, but introduces noise. We reduce this noise by an alpha correction pass, and by an accumulation pass that uses a stochastic shadow map from the camera.