MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols

The recent Meltdown and Spectre attacks highlight the importance of automated verification techniques for identifying hardware security vulnerabilities. We have developed a tool for synthesizing microarchitecture-specific programs capable of producing any user-specified hardware execution pattern of interest. Our tool takes two inputs: a formal description of (i) a microarchitecture in a domain-specific language, and (ii) a microarchitectural execution pattern of interest, e.g. a threat pattern.

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

Memory consistency models (MCMs) which govern inter-module interactions in a shared memory system, are a significant, yet often under-appreciated, aspect of system design. MCMs are defined at the various layers of the hardware-software stack, requiring thoroughly verified specifications, compilers, and implementations at the interfaces between layers. Current verification techniques evaluate segments of the system stack in isolation, such as proving compiler mappings from a high-level language (HLL) to an ISA or proving validity of a microarchitectural implementation of an ISA.

Weak Memory Models with Matching Axiomatic and Operational Definitions

Memory consistency models are notorious for being difficult to define precisely, to reason about, and to verify. More than a decade of effort has gone into nailing down the definitions of the ARM and IBM Power memory models, and yet there still remain aspects of those models which (perhaps surprisingly) remain unresolved to this day.

Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings

The C and C++ high-level languages provide programmers with atomic operations for writing high-performance concurrent code. At the assembly language level, C and C++ atomics get mapped down to individual instructions or combinations of instructions by compilers, depending on the ordering guarantees and synchronization instructions provided by the underlying architecture. These compiler mappings must uphold the ordering guarantees provided by C/C++ atomics or the compiled program will not behave according to the C/C++ memory model.

HarDNN: Feature Map Vulnerability Evaluation in CNNs

As Convolutional Neural Networks (CNNs) are increasingly being employed in safety-critical applications, it is important that they behave reliably in the face of hardware errors. Transient hardware errors may percolate undesirable state during execution, resulting in software-manifested errors which can adversely affect high-level decision making.

Making Convolutions Resilient via Algorithm-Based Error Detection Techniques

The ability of Convolutional Neural Networks (CNNs) to accurately process real-time telemetry has boosted their use in safety-critical and high-performance computing systems. As such systems require high levels of resilience to errors, CNNs must execute correctly in the presence of hardware faults. Full duplication provides the needed assurance but incurs a prohibitive 100% overhead.

Zhuyi: Perception Processing Rate Estimation for Safety in Autonomous Vehicles

The processing requirement of autonomous vehicles (AVs) for high-accuracy perception in complex scenarios can exceed the resources offered by the in-vehicle computer, degrading safety and comfort. This paper proposes a sensor frame processing rate (FPR) estimation model, Zhuyi, that quantifies the minimum safe FPR continuously in a driving scenario. Zhuyi can be employed post-deployment as an online safety check and to prioritize work.

Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles

Extracting interesting scenarios from real-world data as well as generating failure cases is important for the development and testing of autonomous systems. We propose efficient mechanisms to both characterize and generate testing scenarios using a state-of-the-art driving simulator. For any scenario, our method generates a set of possible driving paths and identifies all the possible safe driving trajectories that can be taken starting at different times, to compute metrics that quantify the complexity of the scenario.

EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs

Modern analytics and recommendation systems are increasingly based on graph data that capture the relations between entities being analyzed. Practical graphs come in huge sizes, offer massive parallelism, and are stored in sparse-matrix formats such as CSR. To exploit the massive parallelism, developers are increasingly interested in using GPUs for graph traversal. However, due to their sizes, graphs often do not fit into the GPU memory. Prior works have either used input data pre-processing/partitioning or UVM to migrate chunks of data from the host memory to the GPU memory.

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain their features. Since real-world graphs often exceed the capacity of GPU memory, current GCN training systems keep the feature table in host memory and rely on the CPU to collect sparse features before sending them to the GPUs. This approach, however, puts tremendous pressure on host memory bandwidth and the CPU.