The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware

 

With shrinking process technology, the primary cause of transient faults in semiconductors shifts away from highenergy cosmic particle strikes and toward more mundane and pervasive causes—power fluctuations, crosstalk, and other random noise. Smaller transistor features require a lower critical charge to hold and change bits, which leads to faster microprocessors, but which also leads to higher transient fault rates. Current trends, expected to continue, show soft error rates increasing exponentially at a rate of 8% per technology generation.

How GPUs Work

 

GPUs have moved away from the traditional fixed-function 3D graphics pipeline toward a flexible general-purpose computational engine. Today, GPUs can implement many parallel algorithms directly using graphics hardware. Well-suited algorithms that leverage all the underlying computational horsepower often achieve tremendous speedups. Truly, the GPU is the first widely deployed commodity desktop parallel computer

A Survey of General-Purpose Computation on Graphics Hardware

 

The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability,

have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety

of application domains. In this report, we describe, summarize, and analyze the latest research in mapping

general-purpose computation to graphics hardware.

We begin with the technical motivations that underlie general-purpose computation on graphics processors

GPU Computing

The graphics processing unit (GPU) has become an integral part of today's mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs. The modern GPU is not only a powerful graphics engine but also a highly-parallel programmable processor featuring peak arithmetic and memory bandwidth that substantially outpaces its CPU counterpart.

Real-time Editing and Relighting of Homogeneous Translucent Materials

Existing techniques for fast, high-quality rendering of translucent materials often fix BSSRDF parameters at precomputation time. We present a novel method for accurate rendering and relighting of translucent materials that also enables real-time editing and manipulation of homogeneous diffuse BSSRDFs. We first apply PCA analysis on diffuse multiple scattering to derive a compact basis set, consisting of only twelve 1D functions. We discovered that this small basis set is accurate enough to approximate a general diffuse scattering profile.

Robust Stereo with Flash and No-flash Image Pairs

We propose a new stereo technique using a pair of flash and no-flash stereo images that is both efficient and robust in handling occlusion boundaries.  Our work is motivated by the observation that the brightness variations introduced by the flash can provide a robust cue for establishing stereo matches at occlusion boundaries.  This photometric cue is computed per pixel, and though on its own is not robust to reliably resolve depth, it can provide a new discriminant to support patch-based stereo matching algorithms.

Scalable Ambient Obscurance

This paper presents a set of architecture-aware performance and integration improvements for a recent screen-space ambient obscurance algorithm. These improvements collectively produce a 7x performance increase at 2560x1600, generalize the algorithm to both forward and deferred renderers, and eliminate the radius- and scene-dependence of the previous algorithm to provide a hard real-time guarantee of fixed execution time.

Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum

This technical report is an addendum to the HPG2009 paper "Understanding the Efficiency of Ray Traversal on GPUs", and provides citable performance results for Kepler and Fermi architectures. We explain how to optimize the traversal and intersection kernels for these newer platforms, and what the important architectural limiters are.

Relational Algorithms for Multi-Bulk-Synchronous Processors

Relational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, and Single-Instruction Multiple-Data (SIMD) core organizations.