| Research

Solving Computational Problems with GPU Computing

Modern GPUs are massively parallel microprocessors that can deliver very high performance for the parallel computations common in science and engineering.

Read more about Solving Computational Problems with GPU Computing

Real-Time Stochastic Rasterization on Conventional GPU Architectures

This paper presents a hybrid algorithm for rendering approximate motion and defocus blur with precise stochastic visibility evaluation. It demonstrates---for the first time, with a full stochastic technique---real-time performance on conventional GPU architectures for complex scenes at 1920x1080 HD resolution. The algorithm operates on dynamic triangle meshes for which per-vertex velocity or corresponding vertices from the previous frame are available.

Read more about Real-Time Stochastic Rasterization on Conventional GPU Architectures

Ambient Occlusion Volumes

This paper introduces a new approximation algorithm for the near-field ambient occlusion problem. It combines known pieces in a new way to achieve substantially improved quality over fast methods and substantially improved performance compared to accurate methods. Intuitively, it computes the analog of a shadow volume for ambient light around each polygon, and then applies a tunable occlusion function within the region it encloses. The algorithm operates on dynamic triangle meshes and produces output that is comparable to ray traced occlusion for many scenes.

Read more about Ambient Occlusion Volumes

A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS

This paper describes a 6.25-Gb/s 14-mW transceiver in 90-nm CMOS for chip-to-chip applications. The transceiver employs a number of features for reducing power consumption, including a shared LC-PLL clock multiplier, an inductor-loaded resonant clock distribution network, a low- and programmable-swing voltage-mode transmitter, software-controlled clock and data recovery (CDR) and adaptive equalization within the receiver, and a novel PLL-based phase rotator for the CDR.

Read more about A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS

Hardware-Accelerated Colored Stochastic Shadow Maps

This paper extends the stochastic transparency algorithm that models partial coverage to also model wavelength-varying transmission. It then applies this to the problem of casting shadows between any combination of opaque, colored transmissive, and partially covered (i.e., alpha-matted) surfaces in a manner compatible with existing hardware shadow mapping techniques. Colored Stochastic Shadow Maps have a similar resolution and performance profile to traditional shadow maps, however they require a wider filter in colored areas to reduce hue variation.

Read more about Hardware-Accelerated Colored Stochastic Shadow Maps

Programming Massively Parallel Processors: A Hands-on Approach

The first text of its kind, Programming Massively Parallel Processors: A Hands-on Approach will teach your students the basic concepts of parallel programming and GPU architecture. This text will provide your students with the hands-on skills they need to work in an industry that has moved to multi-core processors. It is now available for adoption.

Read more about Programming Massively Parallel Processors: A Hands-on Approach

Low Viscosity Flow Simulations for Animation

We present a combination of techniques to simulate turbulent fluid flows in 3D. Flow in a complex domain is modeled using a regular rectilinear grid with a finite-difference solution to the incompressible Navier-Stokes equations. We propose the use of the QUICK advection algorithm over a globally high resolution grid. To calculate pressure over the grid, we introduce the Iterated Orthogonal Projection (IOP) framework.

Read more about Low Viscosity Flow Simulations for Animation

A Fast Double Precision CFD Code Using CUDA

We describe a second order double precision nite volume Boussinesq code designed to run on the CUDA architecture. We perform detailed validation of the code on a variety of Rayleigh-Benard convection problems and show second order convergence. We obtain matching results with a Fortran code running on an eight-core CPU. The CUDA-accelerated code performs approximately eight times faster than the Fortran code on identical problems. As a result, we are able to run a simulation with a grid of size 384 x 384 x 192 at 1.6 seconds per time step on a machine with a single GPU.

Read more about A Fast Double Precision CFD Code Using CUDA

PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes

We describe the architecture of a novel system for precomputing
sparse directional occlusion caches. These caches are used for accelerating
a fast cinematic lighting pipeline that works in the spherical
harmonics domain.

Read more about PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes

Interactive Fluid-Particle Simulation using Translating Eulerian Grids

We describe an interactive system featuring fluid-driven animation that responds to moving objects. Our system includes a GPU accelerated Eulerian fluid solver that is suited for real-time use because it is unconditionally stable, takes constant calculation time per frame, and provides good visual fidelity. We dynamically translate the fluid simulation domain to track a user-controlled object. The fluid motion is visualized via its effects on particles which respond to the calculated fluid velocity field, but which are not constrained to stay within the bounds of the simulation domain.

Read more about Interactive Fluid-Particle Simulation using Translating Eulerian Grids

Subscribe to