Yoshi Nishi

Yoshi Nishi joined NVIDIA Research in 2020 after 9+ years as a member of the NVIDIA mixed signal IO design group.  Since 2013 he has led one of the design teams and successfully delivered TX and RX macros for NVLINK 1 and 2 and the PLL macro for NVLINK 3.  Prior to NVIDIA he was chief architect of a 10Gbps burst-mode CDR for 10G-EPON applications at K-micro, first in the market in 2009, and chief designer of the 50Gbps InP HEMT logic family at NTT which were the first 50Gbps chips in the market in 2001. He received the B.S. and M.S.

Accelerating Chip Design with Machine Learning

Recent advancements in machine learning provide an opportunity to transform chip design workflows. We review recent research applying techniques such as deep convolutional neural networks and graph-based neural networks in the areas of automatic design space exploration, power analysis, VLSI physical design, and analog design. We also present a future vision of an AI-assisted automated chip design workflow to aid designer productivity and automate optimization tasks.

DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement

Placement for very-large-scale integrated (VLSI) circuits is one of the most important steps for design closure. We propose a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely-adopted deep learning toolkit PyTorch, with customized key kernels for wirelength and density computations, DREAMPlace can achieve around 40× speedup in global placement without quality degradation compared to the state-of-the-art multi-threaded placer RePlAce.

GRANNITE: Graph Neural Network Inference for Transferable Power Estimation

This paper introduces GRANNITE, a GPU-accelerated novel graph neural network (GNN) model for fast, accurate, and transferable vector-based average power estimation. During training, GRANNITE learns how to propagate average toggle rates through combinational logic: a netlist is represented as a graph, register states and unit inputs from RTL simulation are used as features, and combinational gate toggle rates are used as labels. A trained GNN model can then infer average toggle rates on a new workload of interest or new netlists from RTL simulation results in a few seconds.

ParaGraph: Layout Parasitics and Device Parameter Prediction using Graph Neural Networks

Layout-dependent parasitics and device parameters significantly impact integrated circuit performance and are often the cause of slow convergences between schematic and layout designs. Circuit designers typically estimate parasitics from past experience, resulting in variability between designers and the potential for inaccuracies. In this paper, we present ParaGraph: a graph neural network model to predict net parasitics and device parameters by converting circuit schematics into graphs and leveraging key modeling techniques based on GraphSage, Relation GCN and Graph Attention Networks.

DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement

Placement for very-large-scale integrated (VLSI) circuits is one of the most important steps for design closure. We propose a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely-adopted deep learning toolkit PyTorch, with customized key kernels for wirelength and density computations, DREAMPlace can achieve around 40× speedup in global placement without quality degradation compared to the state-of-the-art multi-threaded placer RePlAce.

A 0.32–128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm

Custom accelerators improve the energy efficiency, area efficiency, and performance of deep neural network (DNN) inference. This article presents a scalable DNN accelerator consisting of 36 chips connected in a mesh network on a multi-chip-module (MCM) using ground-referenced signaling (GRS). While previous accelerators fabricated on a single monolithic chip are optimal for specific network sizes, the proposed architecture enables flexible scaling for efficient inference on a wide range of DNNs, from mobile to data center domains.

ABCDPlace: Accelerated Batch-based Concurrent Detailed Placement on Multi-threaded CPUs and GPUs

Placement is an important step in modern very-large-scale integrated (VLSI) designs. Detailed placement is a placement refining procedure intensively called throughout the design flow, thus its efficiency has a vital impact on design closure. However, since most detailed placement techniques are inherently greedy and sequential, they are generally difficult to parallelize. In this work, we present a concurrent detailed placement framework, ABCDPlace, exploiting multithreading and GPU acceleration.