Stitch-X: An Accelerator Architecture for Exploiting Unstructured Sparsity in Deep Neural Networks

Sparse deep neural network (DNN) accelerators exploit the intrinsic redundancy in data representation to achieve high performance and energy efficiency. However, sparse weight and input activation arrays are unstructured, and their processing cannot take advantage of the regular data-access patterns offered by dense arrays, thus the processing incurs increased complexities in dataflow orchestra- tion and resource management.

QuadStream: A Quad-Based Scene Streaming Architecture for Novel Viewpoint Reconstruction

Cloud rendering is attractive when targeting thin client devices such as phones or VR/AR headsets, or any situation where a high-end GPU is not available due to thermal or power constraints. However, it introduces the challenge of streaming rendered data over a network in a manner that is robust to latency and potential dropouts. Current approaches range from streaming transmitted video and correcting it on the client---which fails in the presence of disocclusion events---to solutions where the server sends geometry and all rendering is performed on the client.

CreatureShop: Interactive 3D Character Modeling and Texturing from a Single Color Drawing

Creating 3D shapes from 2D drawings is an important problem with applications in content creation for computer animation and virtual reality. We introduce a new sketch-based system, CreatureShop, that enables amateurs to create high-quality textured 3D character models from 2D drawings with ease and efficiency.

Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design

The hardware design space is high-dimensional and discrete. Systematic and efficient exploration of this space has been a significant challenge. Central to this problem is the intractable search complexity that grows exponentially with the design choices and the discrete nature of the search space. This work investigates the feasibility of learning a meaningful low-dimensional continuous representation for hardware designs to reduce such complexity and facilitate the search process.

Hardware Abstractions for Targeting EDDO Architectures with the Polyhedral Model

Unlike cache-based load-store architectures, Explicit De- coupled Data Orchestration (EDDO) architectures are pro- grammed using decoupled but synchronized programs run- ning at various units on the hardware, moving data between storage units and/or performing computations. As such, they present a unique programming challenge.

Timeloop: A Systematic Approach to DNN Accelerator Evaluation

This paper presents Timeloop, an infrastructure for evaluating and exploring the architecture design space of deep neural network (DNN) accelerators. Timeloop uses a concise and unified representation of the key architecture and implementation attributes of DNN accelerators to describe a broad space of hardware topologies. It can then emulate those topologies to generate an accurate projection of performance and energy efficiency for a DNN workload through a mapper that finds the best way to schedule operations and stage data on the specified architecture.

HarDNN: Fine-Grained Vulnerability Evaluation and Protection for Convolutional Neural Networks

As CNNs are increasingly being employed in high performance computing and safety-critical applications, ensuring they are reliable to transient hardware errors is important. Full duplication provides high reliability, but the overheads are prohibitively high for resource constrained systems. Fine-grained resilience evaluation and protection can provide a low-cost solution, but traditional methods for evaluation can be too slow. Traditional approaches use error injections and essentially discard information from experiments that do not corrupt outcomes.

SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies

DRAM is the primary technology used for main memory in modern systems. Unfortunately, as DRAM scales down to smaller technology nodes, it faces key challenges in both data integrity and latency, which strongly affects overall system reliability and performance. To develop reliable and high-performance DRAM-based main memory in future systems, it is critical to characterize, understand, and analyze various aspects (e.g., reliability, latency) of existing DRAM chips.

Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content

DRAM cells in close proximity can fail depending on the data content in neighboring cells. These failures are called data-dependent failures. Detecting and mitigating these failures online, while the system is running in the field, enables various optimizations that improve reliability, latency, and energy efficiency of the system. For example, a system can improve performance and energy efficiency by using a lower refresh rate for most cells and mitigate the failing cells using higher refresh rates or error correcting codes.

Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology

Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing architectures, the throughput of bulk bitwise operations is limited by the memory bandwidth available to the processing unit (e.g., CPU, GPU, FPGA, processing-in-memory).To overcome this bottleneck, we propose Ambit, an Accelerator-in-Memory for bulk bitwise operations.