A Fast Double Precision CFD Code Using CUDA

We describe a second order double precision nite volume Boussinesq code designed to run on the CUDA architecture. We perform detailed validation of the code on a variety of Rayleigh-Benard convection problems and show second order convergence. We obtain matching results with a Fortran code running on an eight-core CPU. The CUDA-accelerated code performs approximately eight times faster than the Fortran code on identical problems. As a result, we are able to run a simulation with a grid of size 384 x 384 x 192 at 1.6 seconds per time step on a machine with a single GPU.

PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes

We describe the architecture of a novel system for precomputing
sparse directional occlusion caches. These caches are used for accelerating
a fast cinematic lighting pipeline that works in the spherical
harmonics domain.

Interactive Fluid-Particle Simulation using Translating Eulerian Grids

We describe an interactive system featuring fluid-driven animation that responds to moving objects. Our system includes a GPU accelerated Eulerian fluid solver that is suited for real-time use because it is unconditionally stable, takes constant calculation time per frame, and provides good visual fidelity. We dynamically translate the fluid simulation domain to track a user-controlled object. The fluid motion is visualized via its effects on particles which respond to the calculated fluid velocity field, but which are not constrained to stay within the bounds of the simulation domain.

Image Space Gathering

Soft shadows, glossy reflections and depth of field are valuable effects
for realistic rendering and are often computed using distribution
ray tracing (DRT). These “blurry” effects often need not be
accurate and are sometimes simulated by blurring an image with
sharper effects, such as blurring hard shadows to simulate soft shadows.
One of the most effective examples of such a blurring algorithm
is percentage closer soft shadows (PCSS).

Increasing Memory Miss Tolerance for SIMD Cores

Manycore processors with wide SIMD cores are becoming a popular choice for the next generation of throughput oriented architectures. We introduce a hardware technique called “diverge on miss” that allows SIMD cores to better tolerate memory latency for workloads with non-contiguous memory access patterns. Individual threads within a SIMD “warp” are allowed to slip behind other threads in the same warp, letting the warp continue execution even if a subset of threads are waiting on memory.

Efficient Sparse Voxel Octrees - Analysis, Extensions, and Implementation

This technical report extends our previous paper on sparse voxel octrees. We first discuss the benefits and drawbacks of voxel representations and how the storage space requirements behave for different kinds of content.

Efficient Sparse Voxel Octrees

In this paper we examine the possibilities of using voxel representations as a generic way for expressing complex and feature-rich geometry on current and future GPUs. We present in detail a compact data structure for storing voxels and an efficient algorithm for performing ray casts using this structure. We augment the voxel data with novel contour information that increases geometric resolution, allows more compact encoding of smooth surfaces, and accelerates ray casts. We also employ a novel normal compression format for storing high-precision object-space normals.

Incremental Instant Radiosity for Real-Time Indirect Illumination

We present a method for rendering single-bounce indirect illumination in real time on currently available graphics hardware. The method is based on the instant radiosity algorithm, where virtual point lights (VPLs) are generated by casting rays from the primary light source. Hardware shadow maps are then employed for determining the indirect illumination from the VPLs. Our main contribution is an algorithm for reusing the VPLs and incrementally maintaining their good distribution.

A Hardware Architecture for Surface Splatting

We present a novel architecture for hardware-accelerated rendering of point primitives. Our pipeline implements a refined version of EWA splatting, a high quality method for antialiased rendering of point sampled representations. A central feature of our design is the seamless integration of the architecture into conventional, OpenGL-like graphics pipelines so as to complement triangle-based rendering.

A Meshless Hierarchical Representation for Light Transport

We introduce a meshless hierarchical representation for solving light transport problems. Precomputed radiance transfer (PRT) and Þnite elements require a discrete representation of illumination over the scene. Non-hierarchical approaches such as per-vertex values are simple to implement, but lead to long precomputation. Hier- archical bases like wavelets lead to dramatic acceleration, but in their basic form they work well only on ßat or smooth surfaces. We introduce a hierarchical function basis induced by scattered data approximation.