Hardware Abstractions for Targeting EDDO Architectures with the Polyhedral Model

Unlike cache-based load-store architectures, Explicit De- coupled Data Orchestration (EDDO) architectures are pro- grammed using decoupled but synchronized programs run- ning at various units on the hardware, moving data between storage units and/or performing computations. As such, they present a unique programming challenge.

Timeloop: A Systematic Approach to DNN Accelerator Evaluation

This paper presents Timeloop, an infrastructure for evaluating and exploring the architecture design space of deep neural network (DNN) accelerators. Timeloop uses a concise and unified representation of the key architecture and implementation attributes of DNN accelerators to describe a broad space of hardware topologies. It can then emulate those topologies to generate an accurate projection of performance and energy efficiency for a DNN workload through a mapper that finds the best way to schedule operations and stage data on the specified architecture.

HarDNN: Fine-Grained Vulnerability Evaluation and Protection for Convolutional Neural Networks

As CNNs are increasingly being employed in high performance computing and safety-critical applications, ensuring they are reliable to transient hardware errors is important. Full duplication provides high reliability, but the overheads are prohibitively high for resource constrained systems. Fine-grained resilience evaluation and protection can provide a low-cost solution, but traditional methods for evaluation can be too slow. Traditional approaches use error injections and essentially discard information from experiments that do not corrupt outcomes.

SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies

DRAM is the primary technology used for main memory in modern systems. Unfortunately, as DRAM scales down to smaller technology nodes, it faces key challenges in both data integrity and latency, which strongly affects overall system reliability and performance. To develop reliable and high-performance DRAM-based main memory in future systems, it is critical to characterize, understand, and analyze various aspects (e.g., reliability, latency) of existing DRAM chips.

Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content

DRAM cells in close proximity can fail depending on the data content in neighboring cells. These failures are called data-dependent failures. Detecting and mitigating these failures online, while the system is running in the field, enables various optimizations that improve reliability, latency, and energy efficiency of the system. For example, a system can improve performance and energy efficiency by using a lower refresh rate for most cells and mitigate the failing cells using higher refresh rates or error correcting codes.

Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology

Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing architectures, the throughput of bulk bitwise operations is limited by the memory bandwidth available to the processing unit (e.g., CPU, GPU, FPGA, processing-in-memory).To overcome this bottleneck, we propose Ambit, an Accelerator-in-Memory for bulk bitwise operations.

Towards Analytically Evaluating the Error Resilience of GPU Programs

General purpose Graphics Processing Units (GPUs) have become popular for many reliability-conscious uses including their use for high-performance computation, machine learning algorithms, and business analytics workloads. Fault injection techniques are generally used to determine the reliability profiles of programs in the presence of soft errors, but these techniques are highly resource and time intensive. Trident, an analytical model, was developed for predicting SDC probabilities of CPU programs based on its 3-level modeling technique.

Simulation Driven Design and Test for Safety of AI Based Autonomous Vehicles

An autonomous vehicle (AV) integrates sophisticated perception and localization components to create a model of the world around it, which is then used to navigate the vehicle safely. Machine learning (ML) based models are pervasively used in these components to extract object information from noisy sensor data. The requirements for these components are primarily set to achieve as high accuracy as possible.

Structural Pruning via Latency-Saliency Knapsack

Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop.

UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition

Convolutional Neural Networks (CNNs) have begun to permeate all corners of electronic society (from voice recognition to scene generation) due to their high accuracy and machine efficiency per operation. At their core, CNN computations are made up of multi-dimensional dot products between weight and input vectors. This paper studies how weight repetition - when the same weight occurs multiple times in or across weight vectors - can be exploited to save energy and improve performance during CNN inference.