| Research

Anatomy of GPU Memory System for Multi-Application Execution

As GPUs make headway in the computing landscape spanning mobile platforms, supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of multiple applications in GPUs becomes essential for unlocking their full potential. However, unlike CPUs, multi-application execution in GPUs is little explored. In this paper, we study the memory system of GPUs in a concurrently executing multi-application environment.

Read more about Anatomy of GPU Memory System for Multi-Application Execution

A Case for Toggle-Aware Compression for GPU Systems

Data compression can be an effective method to achieve higher system performance and energy efficiency in modern data-intensive applications by exploiting redundancy and data similarity. Prior works have studied a variety of data compression techniques to improve both capacity (e.g., of caches and main memory) and bandwidth utilization (e.g., of the on-chip and off-chip interconnects). In this paper, we make a new observation about the energy-efficiency of communication when compression is applied.

Read more about A Case for Toggle-Aware Compression for GPU Systems

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

Memory consistency models (MCMs) which govern inter-module interactions in a shared memory system, are a significant, yet often under-appreciated, aspect of system design. MCMs are defined at the various layers of the hardware-software stack, requiring thoroughly verified specifications, compilers, and implementations at the interfaces between layers.

Read more about TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

Automated Synthesis of Comprehensive Memory Model Litmus Test Suites

The memory consistency model is a fundamental part of any shared memory architecture or programming model. Modern weak memory models are notoriously difficult to define and to implement correctly. Most real-world programming languages, compilers, and (micro)architectures therefore rely heavily on black-box testing methodologies. The success of such techniques requires that the suite of litmus tests used to perform the testing be comprehensive—it should ideally stress all obscure corner cases of the model and of its implementation.

Read more about Automated Synthesis of Comprehensive Memory Model Litmus Test Suites

Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms

The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy.

Read more about Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms

Mixed-primary Factorization for Dual-frame Computational Displays

Increasing resolution and dynamic range of digital color displays is challenging with designs confined by cost and power specifications. This necessitates modern displays to trade-off spatial and temporal resolution for color reproduction capability. In this work we explore the idea of joint hardware and algorithm design to balance such trade-offs. We introduce a system that uses content-adaptive and compressive factorizations to reproduce colors.

Read more about Mixed-primary Factorization for Dual-frame Computational Displays

Approxilyzer: Towards A Systematic Framework for Instruction-Level Approximate Computing and its Application to Hardware Resiliency

Approximate computing environments trade off computational accuracy for improvements in performance, energy, and resiliency cost. For widespread adoption of approximate computing, a fundamental requirement is to understand how perturbations to a computation affect the outcome of the execution in terms of its output quality. This paper presents a framework for approximate computing, called Approxilyzer, that quantifies the quality impact of a single-bit error in all dynamic instructions of an execution with high accuracy (95% on average). We demonstrate two uses of Approxilyzer.

Read more about Approxilyzer: Towards A Systematic Framework for Instruction-Level Approximate Computing and its Application to Hardware Resiliency

SASSIFI: An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation

SASSIFI is an architecture-level error injection-based methodology and tool to study the soft error resilience of massively parallel applications running on state-of-the-art NVIDIA GPUs. SASSIFI provides an automated flow to perform error injection campaigns and is publicly available on GitHub at https://github.com/NVlabs/sassifi.

Read more about SASSIFI: An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation

Learning Light Transport the Reinforced Way

We show that the equations of reinforcement learning and light transport simulation are related integral equations. Based on this correspondence, a scheme to learn importance while sampling path space is derived. The new approach is demonstrated in a consistent light transport simulation algorithm that uses reinforcement learning to progressively learn where light comes from.

Read more about Learning Light Transport the Reinforced Way

The Iray Light Transport Simulation and Rendering System

While ray tracing has become increasingly common and path tracing is well understood by now, a major challenge lies in crafting an easy-to-use and efficient system implementing these technologies. Following a purely physically-based paradigm while still allowing for artistic workflows, the Iray light transport simulation and rendering system allows for rendering complex scenes by the push of a button and thus makes accurate light transport simulation widely available.

Read more about The Iray Light Transport Simulation and Rendering System

Subscribe to