Towards Analytically Evaluating the Error Resilience of GPU Programs

General purpose Graphics Processing Units (GPUs) have become popular for many reliability-conscious uses including their use for high-performance computation, machine learning algorithms, and business analytics workloads. Fault injection techniques are generally used to determine the reliability profiles of programs in the presence of soft errors, but these techniques are highly resource and time intensive. Trident, an analytical model, was developed for predicting SDC probabilities of CPU programs based on its 3-level modeling technique.

Simulation Driven Design and Test for Safety of AI Based Autonomous Vehicles

An autonomous vehicle (AV) integrates sophisticated perception and localization components to create a model of the world around it, which is then used to navigate the vehicle safely. Machine learning (ML) based models are pervasively used in these components to extract object information from noisy sensor data. The requirements for these components are primarily set to achieve as high accuracy as possible.

Structural Pruning via Latency-Saliency Knapsack

Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop.

UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition

Convolutional Neural Networks (CNNs) have begun to permeate all corners of electronic society (from voice recognition to scene generation) due to their high accuracy and machine efficiency per operation. At their core, CNN computations are made up of multi-dimensional dot products between weight and input vectors. This paper studies how weight repetition - when the same weight occurs multiple times in or across weight vectors - can be exploited to save energy and improve performance during CNN inference.

Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers

Despite continuing research into inter-GPU communication mechanisms, extracting performance from multi-GPU systems remains a significant challenge. Inter-GPU communication via bulk DMA-based transfers exposes data transfer latency on the GPU’s critical execution path because these large transfers are logically interleaved between compute kernels. Conversely, fine-grained peer-to-peer memory accesses during kernel execution lead to memory stalls that can exceed the GPUs’ ability to cover these operations via multi-threading.

GPS: A Global Publish-Subscribe Model for Multi-GPU Memory Management

Suboptimal management of memory and bandwidth is one of the primary causes of low performance on systems comprising multiple GPUs. Existing memory management solutions like Unified Memory (UM) offer simplified programming but come at the cost of performance: applications can even exhibit slowdown with increasing GPU count due to their inability to leverage system resources effectively. To solve this challenge, we propose GPS, a HW/SW multi-GPU memory management technique that efficiently orchestrates inter-GPU communication using proactive data transfers.

Augmenting Legacy Networks for Flexible Inference.

Abstract. Once deployed in the field, Deep Neural Networks (DNNs) run on devices with widely different compute capabilities and whose computational load varies over time. Dynamic network architectures are one of the existing techniques developed to handle the varying computational load in real-time deployments. Here we introduce LeAF (Legacy Augmentation for Flexible inference), a novel paradigm to augment the key-phases of a pre-trained DNN with alternative, trainable, shallow phases that can be executed in place of the original ones.

NWChem: Past, Present, and Future

Specialized computational chemistry packages have permanently reshaped the landscape of chemical and materials science by providing tools to support and guide the experimental effort and for prediction of chemical and materials properties. In this regard, a special role has been played by electronic structure packages where complex chemical and materials processes can be modeled using first-principle-driven methodologies.