Global Context Vision Transformers

We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision. Our method leverages global context self-attention modules, joint with standard local self-attention, to effectively and efficiently model both long and short-range spatial interactions, without the need for expensive operations such as computing attention masks or shifting local windows. In addition, we address the lack of the inductive bias in ViTs, and propose to leverage a modified fused inverted residual blocks in our architecture.

FasterViT: Fast Vision Transformers with Hierarchical Attention

We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-level attention with reduced computational costs. We benefit from efficient window-based self-attention.

CircuitOps: An ML Infrastructure Enabling Generative AI for VLSI Circuit Optimization

An innovative ML infrastructure named CircuitOps is developed to streamline dataset generation and model inference for various generative AI (GAI)-based circuit optimization tasks. Addressing the challenges of the absence of a shared Intermediate Representation (IR), steep EDA learning curves, and AI-unfriendly data structures, we propose solutions that empower efficient data handling.

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with HyPoradise dataset to learn the mapping from ASR N-best hypotheses to ground-truth transcription by efficient LLM finetuning, which shows great effectiveness but lacks specificity on noise-robust ASR.

A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS

This paper presents an Inverter-based AC-coupled Toggle (ISR-ACT) transceiver targeted for short-reach die-to-die communication over silicon interposer or similar high-density interconnect. The ISR-ACT’s transmitter sends NRZ data through a small on-chip capacitor into the line. The receiver amplifies the low-swing pulses using a 1st-stage TIA to fully toggle the 2nd-stage output, where positive feedback to the input pad maintains the DC level on the line.

Omer Shapira

Omer Shapira joined NVIDIA Research in 2023. His research interests are in Computer Graphics, Virtual and Extended Reality (XR), Human-Computer Interaction and Haptics - with current focus on AI-Mediated Human Interaction.
Since 2016, he worked at NVIDIA's Simulation Technology group, leading the Omniverse XR group and contributing to Isaac, Drivesim, and NVIDIA's core XR technology and research.

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts. This is the goal of semi-supervised learning, which exploits more widely available unlabeled data to complement small labeled data sets. In this paper, we propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.