Architecture Considerations for Tracing Incoherent Rays

This paper proposes a massively parallel hardware architecture for efficient tracing of incoherent rays, e.g. for global illumination. The general approach is centered around hierarchical treelet subdivision of the acceleration structure and repeated queueing/postponing of rays to reduce cache pressure. We describe a heuristic algorithm for determining the treelet subdivision, and show that our architecture can reduce the total memory bandwidth requirements by up to 90% in difficult scenes.

HLBVH: Hierarchical LBVH Construction for Real-Time Ray Tracing

We present HLBVH and SAH-optimized HLBVH, two high performance BVH construction algorithms targeting real-time ray tracing of dynamic geometry. HLBVH provides a novel hierarchical formulation of the LBVH algorithm [Lauterbach et al 2009] and SAH-optimized HLBVH uses a new combination of HLBVH and the greedy surface area heuristic algorithm. These algorithms minimize work and memory bandwidth usage by extracting and exploiting coarse-grained spatial coherence already available in the input meshes.

OptiX: A General Purpose Ray Tracing Engine

The NVIDIA(R) OptiX(TM) ray tracing engine is a programmable system designed for NVIDIA GPUs and other highly parallel architectures. The OptiX engine builds on the key observation that most ray tracing algorithms can be implemented using a small set of programmable operations. Consequently, the core of OptiX is a domain-specific just-in-time compiler that generates custom ray tracing kernels by combining user-supplied programs for ray generation, material shading, object intersection, and scene traversal.

Solving Computational Problems with GPU Computing

Modern GPUs are massively parallel microprocessors that can deliver very high performance for the parallel computations common in science and engineering.

Real-Time Stochastic Rasterization on Conventional GPU Architectures

This paper presents a hybrid algorithm for rendering approximate motion and defocus blur with precise stochastic visibility evaluation. It demonstrates---for the first time, with a full stochastic technique---real-time performance on conventional GPU architectures for complex scenes at 1920x1080 HD resolution. The algorithm operates on dynamic triangle meshes for which per-vertex velocity or corresponding vertices from the previous frame are available.

Ambient Occlusion Volumes

This paper introduces a new approximation algorithm for the near-field ambient occlusion problem. It combines known pieces in a new way to achieve substantially improved quality over fast methods and substantially improved performance compared to accurate methods. Intuitively, it computes the analog of a shadow volume for ambient light around each polygon, and then applies a tunable occlusion function within the region it encloses. The algorithm operates on dynamic triangle meshes and produces output that is comparable to ray traced occlusion for many scenes.

A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS

This paper describes a 6.25-Gb/s 14-mW transceiver in 90-nm CMOS for chip-to-chip applications. The transceiver employs a number of features for reducing power consumption, including a shared LC-PLL clock multiplier, an inductor-loaded resonant clock distribution network, a low- and programmable-swing voltage-mode transmitter, software-controlled clock and data recovery (CDR) and adaptive equalization within the receiver, and a novel PLL-based phase rotator for the CDR.

Hardware-Accelerated Colored Stochastic Shadow Maps

This paper extends the stochastic transparency algorithm that models partial coverage to also model wavelength-varying transmission. It then applies this to the problem of casting shadows between any combination of opaque, colored transmissive, and partially covered (i.e., alpha-matted) surfaces in a manner compatible with existing hardware shadow mapping techniques. Colored Stochastic Shadow Maps have a similar resolution and performance profile to traditional shadow maps, however they require a wider filter in colored areas to reduce hue variation.

Programming Massively Parallel Processors: A Hands-on Approach

The first text of its kind, Programming Massively Parallel Processors: A Hands-on Approach will teach your students the basic concepts of parallel programming and GPU architecture.  This text will provide your students with the hands-on skills they need to work in an industry that has moved to multi-core processors.  It is now available for adoption.

Low Viscosity Flow Simulations for Animation

We present a combination of techniques to simulate turbulent fluid flows in 3D. Flow in a complex domain is modeled using a regular rectilinear grid with a finite-difference solution to the incompressible Navier-Stokes equations. We propose the use of the QUICK advection algorithm over a globally high resolution grid. To calculate pressure over the grid, we introduce the Iterated Orthogonal Projection (IOP) framework.