Robust Stereo with Flash and No-flash Image Pairs

We propose a new stereo technique using a pair of flash and no-flash stereo images that is both efficient and robust in handling occlusion boundaries.  Our work is motivated by the observation that the brightness variations introduced by the flash can provide a robust cue for establishing stereo matches at occlusion boundaries.  This photometric cue is computed per pixel, and though on its own is not robust to reliably resolve depth, it can provide a new discriminant to support patch-based stereo matching algorithms.

Scalable Ambient Obscurance

This paper presents a set of architecture-aware performance and integration improvements for a recent screen-space ambient obscurance algorithm. These improvements collectively produce a 7x performance increase at 2560x1600, generalize the algorithm to both forward and deferred renderers, and eliminate the radius- and scene-dependence of the previous algorithm to provide a hard real-time guarantee of fixed execution time.

Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum

This technical report is an addendum to the HPG2009 paper "Understanding the Efficiency of Ray Traversal on GPUs", and provides citable performance results for Kepler and Fermi architectures. We explain how to optimize the traversal and intersection kernels for these newer platforms, and what the important architectural limiters are.

Relational Algorithms for Multi-Bulk-Synchronous Processors

Relational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, and Single-Instruction Multiple-Data (SIMD) core organizations.

Detecting Regions of Interest in Dynamic Scenes with Camera Motions

We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple moving objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noiseand can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees

A number of methods for constructing bounding volume hierarchies and point-based octrees on the GPU are based on the idea of ordering primitives along a space-filling curve. A major shortcoming with these methods is that they construct levels of the tree sequentially, which limits the amount of parallelism that they can achieve. We present a novel approach that improves scalability by constructing the entire tree in parallel. Our main contribution is an in-place algorithm for constructing binary radix trees, which we use as a building block for other types of trees.

Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU

A novel algorithm for computing the incomplete-LU and Cholesky factorization with 0 fill-in on a graphics processing unit (GPU) is proposed. It implements the incomplete factorization of the given matrix in two phases. First, the symbolic analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the numerical factorization phase obtains the resulting lower and upper sparse triangular factors by iterating sequentially across the constructed levels.

A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors

Modern graphics processing units (GPUs) employ a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complex thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs.

Interactive Indirect Illumination Using Voxel Cone Tracing

Indirect illumination is an important element for realistic image synthesis, but its computation is expensive and highly dependent on the complexity of the scene and of the BRDF of the involved surfaces. While off-line computation and pre-baking can be acceptable for some cases, many applications (games, simulators, etc.) require real-time or interactive approaches to evaluate indirect illumination.

A Path Space Extension for Robust Light Transport Simulation

We propose a new sampling space for light transport simulation which allows to unify two popular algorithms with orthogonal strengths: unbiased Monte Carlo path integration and photon density estimation. Traditionally, unbiased Monte Carlo path integration has been considered the only approach for accurate light transport simulation. However, recent work in photon density estimation has demonstrated that there are several practical scene configurations where photon density estimation is more efficient and accurate.