Detecting Regions of Interest in Dynamic Scenes with Camera Motions

We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple moving objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noiseand can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees

A number of methods for constructing bounding volume hierarchies and point-based octrees on the GPU are based on the idea of ordering primitives along a space-filling curve. A major shortcoming with these methods is that they construct levels of the tree sequentially, which limits the amount of parallelism that they can achieve. We present a novel approach that improves scalability by constructing the entire tree in parallel. Our main contribution is an in-place algorithm for constructing binary radix trees, which we use as a building block for other types of trees.

Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU

A novel algorithm for computing the incomplete-LU and Cholesky factorization with 0 fill-in on a graphics processing unit (GPU) is proposed. It implements the incomplete factorization of the given matrix in two phases. First, the symbolic analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the numerical factorization phase obtains the resulting lower and upper sparse triangular factors by iterating sequentially across the constructed levels.

A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors

Modern graphics processing units (GPUs) employ a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complex thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs.

Interactive Indirect Illumination Using Voxel Cone Tracing

Indirect illumination is an important element for realistic image synthesis, but its computation is expensive and highly dependent on the complexity of the scene and of the BRDF of the involved surfaces. While off-line computation and pre-baking can be acceptable for some cases, many applications (games, simulators, etc.) require real-time or interactive approaches to evaluate indirect illumination.

A Path Space Extension for Robust Light Transport Simulation

We propose a new sampling space for light transport simulation which allows to unify two popular algorithms with orthogonal strengths: unbiased Monte Carlo path integration and photon density estimation. Traditionally, unbiased Monte Carlo path integration has been considered the only approach for accurate light transport simulation. However, recent work in photon density estimation has demonstrated that there are several practical scene configurations where photon density estimation is more efficient and accurate.

Policy-based Tuning for Performance Portability and Library Co-optimization

Although modular programming is a fundamental software development practice, software reuse within contemporary GPU kernels is uncommon. For GPU software assets to be reusable across problem instances, they must be inherently flexible and tunable. To illustrate, we survey the performance-portability landscape for a suite of common GPU primitives, evaluating thousands of reasonable program variants across a large diversity of problem instances (microarchitecture, problem size, and data type).

Efficient Parallel Merge Sort for Fixed and Variable Length Keys

We design a high-performance parallel merge sort for highly parallel systems. Our merge sort is designed to use more register communication (not shared memory), and does not suffer from over-segmentation as opposed to previous comparison based sorts. Using these techniques we are able to achieve a sorting rate of 250 MKeys/sec, which is about 2.5 times faster than Thrust merge sort performance, and 70% faster than non-stable state-of-the-art GPU merge sorts.

Simulation of Bubbles in Foam with the Volume Control Method

Liquid and gas interactions often produce bubbles that stay for a long time without bursting on the surface, making a dry foam struc- ture. Such long lasting bubbles simulated by the level set method can suffer from a small but steady volume error that accumulates to a visible amount of volume change. We propose to address this problem by using the volume control method. We track the vol- ume change of each connected region, and apply a carefully com- puted divergence that compensates undesired volume changes.

Twister: A Space-warp Operator for the Two-handed Editing of 3D Shapes

A free-form deformation that warps a surface or solid may be spec- ied in terms of one or several point-displacement constraints that must be interpolated by the deformation. The Twister approach in- troduced here, adds the capability to impose an orientation change, adding three rotational constraints, at each displaced point. Further- more, it solves for a space warp that simultaneously interpolates two sets of such displacement and orientation constraints.