ML-based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection

The safety and resilience of fully autonomous vehicles (AVs) are of significant concern, as exemplified by several headline-making accidents. While AV development today involves verification, validation, and testing, end-to-end assessment of AV systems under accidental faults in realistic driving scenarios has been largely unexplored. This paper presents DriveFI, a machine learning-based fault injection engine, which can mine situations and faults that maximally impact AV safety, as demonstrated on two industry-grade AV technology stacks (from NVIDIA and Baidu).

MAGNet: A Modular Accelerator Generator for Neural Networks

Deep neural networks have been adopted in a wide range of application domains, leading to high demand for inference accelerators. However, the high cost associated with ASIC hardware design makes it challenging to build custom accelerators for different targets. To lower design cost, we propose MAGNet, a modular accelerator generator for neural networks.

Joint Optimization for Cooperative Image Captioning

When describing images with natural language, descriptions can be made more informative if tuned for downstream tasks. This can be achieved by training two networks: a “speaker” that generates sentences given an image and a “listener” that uses them to perform a task. Unfortunately, training multiple networks jointly to communicate, faces two major challenges. First, the descriptions generated by a speaker network are discrete and stochastic, making optimization very hard and inefficient.

The Even/Odd Synchronizer: A Fast, All-Digital Periodic Synchronizer

We describe an all-digital synchronizer that moves multi-bit signals between two periodic clock domains with an average delay of slightly more than a half cycle and an arbitrarily small probability of synchronization failure. The synchronizer operates by measuring the relative frequency of the two periodic clocks and using this frequency measurement, along with a phase detection, to compute a phase estimate. Interval arithmetic is used for the phase estimate to account for uncertainty.

NRMVS: Non-Rigid Multi-view Stereo

Multi-view Stereo (MVS) is a common solution in photogrammetry applications for the dense reconstruction of a static scene from images. The static scene assumption, however, limits the general applicability of MVS algorithms, as many day-to-day scenes undergo non-rigid motion, e.g., clothes, faces, or human bodies. In this paper, we open up a new challenging direction: Dense 3D reconstruction of scenes with non-rigid changes observed from a small number of images sparsely captured from different views with a single monocular camera, which we call non-rigid multi-view stereo (NRMVS) problem.

Hand Pose Estimation via Latent 2.5 D Heatmap Regression

Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi- view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image is much less straight- forward. The main difficulty arises from the fact that 3D pose requires some form of depth estimates, which are ambiguous given only an RGB image. In this paper we propose a new method for 3D hand pose estima- tion from a monocular image through a novel 2.5D pose representation.

Importance Estimation for Neural Network Pruning

Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections.

Analyzing and Improving the Image Quality of StyleGAN

The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent vectors to images.

Content-Consistent Generation of Realistic Eyes with Style

Accurately labeled real-world training data can be scarce, and hence recent works adapt, modify or generate images to boost target datasets. However, retaining relevant details from input data in the generated images is challenging and failure could be critical to the performance on the final task. In this work, we synthesize person-specific eye images that satisfy a given semantic segmentation mask (content), while following the style of a specified person from only a few reference images.