A 0.297-pJ/bit 50.4-Gb/s/wire Inverter-Based Short-Reach Simultaneous Bidirectional Transceiver for Die-to-Die Interface in 5nm CMOS

This paper presents a clock-forwarded, Inverter-based Short-Reach Simultaneous Bi-Directional (ISR-SBD) PHY targeted for die-to-die communication over silicon interposer or similar high-density interconnect. Fabricated in a 5nm standard CMOS process, ISR-SBD PHY demonstrates 50.4Gb/s/wire (25.2Gb/s each direction) and 0.297pJ/bit on a 0.75V supply over a 1.2mm on-chip channel.

LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update

Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction. Previous methods that train DNNs in low-precision typically keep a copy of weights in high-precision during the weight updates. Directly training with low-precision weights leads to accuracy degradation due to complex interactions between the low-precision number systems and the learning algorithms.

An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design

Lithography modeling is a crucial problem in chip design to ensure a chip design mask is manufacturable. It requires rigorous simulations of optical and chemical models that are computationally expensive. Recent developments in machine learning have provided alternative solutions in replacing the time-consuming lithography simulations with deep neural networks. However, the considerable accuracy drop still impedes its industrial adoption. Most importantly, the quality and quantity of the training dataset directly affect the model performance.

TAG: Learning Circuit Spatial Embedding from Layouts

Analog and mixed-signal (AMS) circuit designs still rely on human design expertise. Machine learning has been assisting circuit design automation by replacing human experience with artificial intelligence. This paper presents TAG, a new paradigm of learning the circuit representation from layouts leveraging Text, self Attention and Graph. The embedding network model learns spatial information without manual labeling. We introduce text embedding and a self-attention mechanism to AMS circuit learning.

TransSizer: A Novel Transformer-Based Fast Gate Sizer

Gate sizing is a fundamental netlist optimization move and researchers have used supervised learning-based models in gate sizers. Recently, Reinforcement Learning (RL) has been tried for sizing gates (and other EDA optimization problems) but are very runtime-intensive. In this work, we explore a novel Transformer-based gate sizer, TransSizer, to directly generate optimized gate sizes given a placed and unoptimized netlist. TransSizer is trained on datasets obtained from real tapeout-quality industrial designs in a foundry 5nm technology node.

Why are Graph Neural Networks Effective for EDA Problems?

In this paper, we discuss the source of effectiveness of Graph Neural Networks (GNNs) in EDA, particularly in the VLSI design automation domain. We argue that the effectiveness comes from the fact that GNNs implicitly embed the prior knowledge and inductive biases associated with given VLSI tasks, which is one of the three approaches to make a learning algorithm physics-informed. These inductive biases are different to those common used in GNNs designed for other structured data, such as social networks and citation networks.

Photonic Circuits for Accelerated Computing Systems

GPU-based accelerated computing is powering the AI revolution. These systems include processors and switches which push thermal power density limits while demanding large I/O bandwidths. To continue scaling, very dense integration of ultra-efficient optical transceivers is called for to alleviate current inefficiencies in off-package signalling.

Merlin Nimier-David

Merlin is a senior research scientist at NVIDIA. His research focuses on differentiable physically based rendering, including how to efficiently and accurately computing gradients through rendering algorithms. These gradients can then be leveraged in a variety of inverse tasks, such as recovering the materials and lighting from photographs. He contributed to the development of the Mitsuba differentiable renderer.

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. 

Display Size and Targeting Performance: Small Hurts, Large May Help

Which display size helps gamers win? Recommendations from the research and PC gaming communities are contradictory. We find that as display size grows, targeting performance improves. When size increases from 13" to 26", targeting time drops by over 3%. Further size increases from 26" through 39", 52" and 65", bring more modest improvements, with targeting time dropping a further 1%. While such improvements may not be meaningful for novice gamers, they are extremely important to skilled and competitive players.