Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference

Analog/mixed-signal (AMS) computation can be more energy efficient than digital approaches for deep learning inference, but incurs an accuracy penalty from precision loss. Prior AMS approaches focus on small networks/datasets, which can maintain accuracy even with 2b precision. We analyze applicability of AMS approaches to larger networks by proposing a generic AMS error model, implementing it in an existing training framework, and investigating its effect on ImageNet classification with ResNet-50.

PRIMAL: Power Inference using Machine Learning

This paper introduces PRIMAL, a novel learning-based framework that enables fast and accurate power estimation for ASIC designs. PRIMAL trains machine learning (ML) models with design verification testbenches for characterizing the power of reusable circuit building blocks. The trained models can then be used to generate detailed power profiles of the same blocks under different workloads. We evaluate the performance of several established ML models on this task, including ridge regression, gradient tree boosting, multi-layer perceptron, and convolutional neural network (CNN).

High Performance Graph Convolutional Networks with Applications in Testability Analysis

Applications of deep learning to electronic design automation (EDA) have recently begun to emerge, although they have mainly been limited to processing of regular structured data such as images. However, many EDA problems require processing irregular structures, and it can be non-trivial to manually extract important features in such cases. In this paper, a high performance graph convolutional network (GCN) model is proposed for the purpose of processing irregular graph representations of logic circuits. A GCN classifier is firstly trained to predict observation point candidates in a netlist.

Near-Eye Display and Tracking Technologies for Virtual and Augmented Reality

Virtual and augmented reality (VR/AR) are expected to revolutionise entertainment, healthcare, communication and the manufac-turing industries among many others. Near-eye displays are an enabling vessel for VR/AR applications, which have to tacklemany challenges related to ergonomics, comfort, visual quality and natural interaction. These challenges are related to thecore elements of these near-eye display hardware and tracking technologies.

Simple Environment Map Filtering Using Ray Cones and Ray Differentials

We describe simple methods for how to filter environment maps using ray cones and ray differentials in a ray tracing engine.

Texture Level of Detail Strategies for Real-Time Ray Tracing

Unlike rasterization, where one can rely on pixel quad partial derivatives, an alternative approach must be taken for filtered texturing during ray tracing. We describe two methods for computing texture level of detail for ray tracing. The first approach uses ray differentials, which is a general solution that gives high-quality results. It is rather expensive in terms of computations and ray storage, however. The second method builds on ray cone tracing and uses a single trilinear lookup, a small amount of ray storage, and fewer computations than ray differentials.

Precision Improvements for Ray/Sphere Intersection

The traditional quadratic formula is often presented as the way to compute the intersection of a ray with a sphere. While mathematically correct, this factorization can be numerically unstable when using floating-point arithmetic. We give two little-known reformulations and show how each can improve precision.

What is a Ray?

We define a ray, show how to use ray intervals, and demonstrate how to specify a ray using DirectX Raytracing (DXR)

Temporally Dense Ray Tracing

We present a technique for real-time ray tracing with the goal of reaching 240 frames per second or more. The core idea is to trade spatial resolution for faster temporal updates in such a way that the display and human visual system aid in integrating high-quality images. We use a combination of frameless and interleaved rendering concepts together with ideas from temporal antialiasing algorithms and novel building blocks---the major one being adaptive selection of pixel orderings within tiles, which reduces spatiotemporal aliasing significantly.

A Fine-Grained GALS SoC with Pausible Adaptive Clocking in 16 nm FinFET

Modern SoCs suffer from power supply noise that can require significant additional timing margin, reducing performance and energy efficiency. Globally asynchronous, locally synchronous (GALS) systems can mitigate the impact of power supply noise, as well as simplify system design by removing the need for global timing closure. This work presents a 4mm2 distributed accelerator engine with 19 independent clock domains implemented in a 16nm process.