TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

Memory consistency models (MCMs) which govern inter-module interactions in a shared memory system, are a significant, yet often under-appreciated, aspect of system design. MCMs are defined at the various layers of the hardware-software stack, requiring thoroughly verified specifications, compilers, and implementations at the interfaces between layers.

Automated Synthesis of Comprehensive Memory Model Litmus Test Suites

The memory consistency model is a fundamental part of any shared memory architecture or programming model. Modern weak memory models are notoriously difficult to define and to implement correctly. Most real-world programming languages, compilers, and (micro)architectures therefore rely heavily on black-box testing methodologies. The success of such techniques requires that the suite of litmus tests used to perform the testing be comprehensive—it should ideally stress all obscure corner cases of the model and of its implementation.

Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms

The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy.

Mixed-primary Factorization for Dual-frame Computational Displays

Increasing resolution and dynamic range of digital color displays is challenging with designs confined by cost and power specifications. This necessitates modern displays to trade-off spatial and temporal resolution for color reproduction capability. In this work we explore the idea of joint hardware and algorithm design to balance such trade-offs. We introduce a system that uses content-adaptive and compressive factorizations to reproduce colors.

Approxilyzer: Towards A Systematic Framework for Instruction-Level Approximate Computing and its Application to Hardware Resiliency

Approximate computing environments trade off computational accuracy for improvements in performance, energy, and resiliency cost. For widespread adoption of approximate computing, a fundamental requirement is to understand how perturbations to a computation affect the outcome of the execution in terms of its output quality. This paper presents a framework for approximate computing, called Approxilyzer, that quantifies the quality impact of a single-bit error in all dynamic instructions of an execution with high accuracy (95% on average). We demonstrate two uses of Approxilyzer.

SASSIFI: An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation

SASSIFI is an architecture-level error injection-based methodology and tool to study the soft error resilience of massively parallel applications running on state-of-the-art NVIDIA GPUs. SASSIFI provides an automated flow to perform error injection campaigns and is publicly available on GitHub at https://github.com/NVlabs/sassifi.

Learning Light Transport the Reinforced Way

We show that the equations of reinforcement learning and light transport simulation are related integral equations. Based on this correspondence, a scheme to learn importance while sampling path space is derived. The new approach is demonstrated in a consistent light transport simulation algorithm that uses reinforcement learning to progressively learn where light comes from.

The Iray Light Transport Simulation and Rendering System

While ray tracing has become increasingly common and path tracing is well understood by now, a major challenge lies in crafting an easy-to-use and efficient system implementing these technologies. Following a purely physically-based paradigm while still allowing for artistic workflows, the Iray light transport simulation and rendering system allows for rendering complex scenes by the push of a button and thus makes accurate light transport simulation widely available.

Efficient Incoherent Ray Traversal on GPUs Through Compressed Wide BVHs

We present a GPU-based ray traversal algorithm that operates on compressed wide BVHs and maintains the traversal stack in a compressed format. Our method reduces the amount of memory traffic significantly, which translates to 1.9-2.1x improvement in incoherent ray traversal performance compared to the current state of the art. Furthermore, the memory consumption of our hierarchy is 35-60% of a typical uncompressed BVH.

Membrane AR: Varifocal, Wide-Field-of-View Augmented Reality Display from Deformable Membranes

Some of the fundamental limitations of existing near-eye displays (NEDs) for augmented reality (AR) are limited field of view (FOV), low angular resolution, and fixed focal state. Optimizing a design to overcome one of these limitations typically leads to a trade-off in the others. This project introduces an all-in-one solution: deformable beamsplitters. It is a new new hybrid hardware design created by combining hyperbolic half-silvered mirrors and deformable membrane mirrors (DMMs).