Preconditioned Block-Iterative Methods on GPUs

An implementation of the incomplete-LU/Cholesky preconditioned block-iterative methods on the Graphics Processing Units (GPUs) using the CUDA parallel programming model is presented. In particular, we focus on the tradeoffs associated with the sparse matrix-vector multiplication with multiple vectors, sparse triangular solve with multiple right-hand-sides (rhs) as well as incomplete factorization with 0 fill-in. We use these building blocks to implement the block-CG and block-BiCGStab iterative methods for the symmetric positive definite (s.p.d.) and nonsymmetric linear systems, respectively.

Frustum-Traced Raster Shadows: Revisiting Irregular Z-Buffers

We present a real-time system that renders antialiased hard shadows using irregular z-buffers (IZBs). For subpixel accuracy, we use 32 samples per pixel at roughly twice the cost of a single sample. Our system remains interactive on a variety of game assets and CAD models while running at 1080p and 2160p and imposes no constraints on light, camera or geometry, allowing fully dynamic scenes without precomputation. Unlike shadow maps we introduce no spatial or temporal aliasing, smoothly animating even subpixel shadows from grass or wires.

cuDNN: Efficient Primitives for Deep Learning

We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized, which makes maintaining codebases difficult over time. Similar issues have long been addressed in the HPC community by libraries such as the Basic Linear Algebra Subroutines (BLAS). However, there is no analogous library for deep learning.

Aggregate G-Buffer Anti-Aliasing

We present Aggregate G-Buffer Anti-Aliasing (AGAA), a new technique for efficient anti-aliased deferred rendering of complex geometry using modern graphics hardware. In geometrically complex situations, where many surfaces intersect a pixel, current rendering systems shade each contributing surface at least once per pixel. As the sample density and geometric complexity increase, the shading cost becomes prohibitive for real-time rendering. Under deferred shading, so does the required framebuffer memory.

DT-SLAM: Deferred Triangulation for Robust SLAM

Obtaining a good baseline between different video frames is one of the key elements in vision-based monocular SLAM systems. However, if the video frames contain only a few 2D feature correspondences with a good baseline, or the camera only rotates without sufficient translation in the beginning, tracking and mapping becomes unstable. We introduce a real-time visual SLAM system that incrementally tracks individual 2D features, and estimates camera pose by using matched 2D features, regardless of the length of the baseline.

FlexISP: A Flexible Camera Image Processing Framework

Conventional pipelines for capturing, displaying, and storing images are usually defined as a series of cascaded modules, each responsible for addressing a particular problem. While this divide-and-conquer approach offers many benefits, it also introduces a cumulative error, as each step in the pipeline only considers the output of the previous step, not the original sensor data.

A Non-Linear Filter for Gyroscope-Based Video Stabilization

We present a method for video stabilization and rolling-shutter correction for videos captured on mobile devices. The method uses the data from an on-board gyroscope to track the camera's angular velocity, and can run in real time within the camera capture pipeline.

Fast Global Illumination Approximations on Deep G-Buffers

Deep Geometry Buffers (G-buffers) combine the fine-scale and efficiency of screen-space data with much of the robustness of voxels. We introduce a new hardware-aware method for computing two-layer deep G-buffers and show how to produce dynamic indirect radiosity, ambient occlusion (AO), and mirror reflection from them in real-time. Our illumination computation approaches the performance of today’s screen-space AO-only rendering passes on current GPUs and far exceeds their quality.

Fast ANN for High-Quality Collaborative Filtering

Collaborative filtering collects similar patches, jointly filters them, and scatters the output back to input patches; each pixel gets a contribution from each patch that overlaps with it, allowing signal reconstruction from highly corrupted data. Exploiting self-similarity, however, requires finding matching image patches, which is an expensive operation. We propose a GPU-friendly approximated-nearest-neighbor algorithm that produces high-quality results for any type of collaborative filter. We evaluate our ANN search against state-of-the-art ANN algorithms in several application domains.

Dynamic Image Stacks

Since its invention, photography has been driven by a relatively fixed paradigm: capture, develop, and print.

Even with the advent of digital photography, the photographic process still continues to focus on creating a single, final still image suitable for printing. This implicit association between a display pixel and a static RGB value can constrain a photographer's creative agency.



We present dynamic image stacks, an interactive image viewer exploring what photography can become when this constraint is relaxed.