Addressing System-Level Optimization with OpenVX Graphs

During the performance optimization of a computer vision system, developers frequently run into platform-level inefficiencies and bottlenecks that can not be addressed by traditional methods. OpenVX is designed to address such system-level issues by means of a graph-based computation model. This approach differs from the traditional acceleration of one-off functions, and exposes optimization possibilities that might not be available or obvious with traditional computer vision libraries such as OpenCV.

Cascaded Displays: Spatiotemporal Superresolution using Offset Pixel Layers

We demonstrate that layered spatial light modulators (SLMs), subject to fixed lateral displacements and refreshed at staggered intervals, can synthesize images with greater spatiotemporal resolution than that afforded by any single SLM used in their construction. Dubbed cascaded displays, such architectures enable superresolution flat panel displays (e.g., using thin stacks of liquid crystal displays (LCDs)) and digital projectors (e.g., relaying the image of one SLM onto another).

Perceptual Depth Compression for Stereo Applications

Conventional depth video compression uses video codecs designed for color images. Given the performance of current encoding standards, this solution seems efficient. However, such an approach suffers from many issues stemming from discrepancies between depth and light perception. To exploit the inherent limitations of human depth perception, we propose a novel depth compression method that employs a disparity perception model. In contrast to previous methods, we account for disparity masking, and model a distinct relation between depth perception and contrast in luminance.

WYSIWYG Computational Photography via Viewfinder Editing

Digital cameras with electronic viewfinders provide a relatively faithful depiction of the final image, providing a WYSIWYG experience. If, however, the image is created from a burst of differently captured images, or non-linear interactive edits significantly alter the final outcome, then the photographer cannot directly see the results, but instead must imagine the post-processing effects. This paper explores the notion of viewfinder editing, which makes the viewfinder more accurately reflect the final image the user intends to create.

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor

Modern throughput processors such as GPUs employ thousands of threads to drive high-bandwidth, long-latency memory systems. These threads require substantial on-chip storage for registers, cache, and scratchpad memory. Existing designs hard-partition this local storage, fixing the capacities of these structures at design time. We evaluate modern GPU workloads and find that they have widely varying capacity needs across these different functions.

A Decomposition for In-place Array Transposition

We describe a decomposition for in-place matrix transposition, with applications to Array of Structures memory accesses on SIMD processors. Traditional approaches to in-place matrix transposition involve cycle following, which is difficult to parallelize, and on matrices of dimension m by n require O(mn log mn) work when limited to less than O(mn) auxiliary space. Our decomposition allows the rows and columns to be operated on independently during in-place transposition, reducing work complexity to O(mn), given O(max(m, n)) auxiliary space.

Lighting Deep G-Buffers: Single-Pass, Layered Depth Images with Minimum Separation Applied to Indirect Illumination

We introduce a new method for computing two-level Layered Depth Images (LDIs) [Shade et al. 1998] that is designed for modern GPUs. The method is order-independent, can guarantee a mini- mum depth separation between the layers, operates within small, bounded memory, and requires no explicit sorting. Critically, it also operates in a single pass over scene geometry.

A Parallel Auxiliary Grid Algebraic Multigrid Method for Graphic Processing Units

In this paper, we develop a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of graphic processing units (GPUs). In the construction of the hierarchical coarse grid, we use a simple and fixed coarsening procedure based on a region quadtree generated from an auxiliary grid. This allows us to explicitly control the sparsity patterns and operator complexities of the AMG solver.

Lossless and Near Lossless Compression of Real Color Filter Array data

Compression of Bayer pattern color filter array (CFA) data has gained a lot of attention during past years. Numerous algorithms have been proposed for lossless, near-lossless and lossy compression. The performance evaluation of compression methods is typically done only for artificial CFA data, obtained by sub-sampling full color images according to CFA pattern, without taking into account that CFA data are heavily processed before obtaining full color images. Therefore, some assumptions that are true for reconstructed images may not hold for real raw data.

A Fast and Stable Feature-Aware Motion Blur Filter

High-quality motion blur is an increasingly important and pervasive effect in interactive graphics that, even in the context of offline rendering, is often approximated using a post process. Recent motion blur post-process filters (e.g., [MHBO12, Sou13]) efficiently generate plausible results suitable for modern interactive rendering pipelines. However, these approaches may produce distracting artifacts, for instance, when different motions overlap in depth or when both large- and fine-scale features undergo motion.