FasterViT: Fast Vision Transformers with Hierarchical Attention

We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-level attention with reduced computational costs. We benefit from efficient window-based self-attention.

CircuitOps: An ML Infrastructure Enabling Generative AI for VLSI Circuit Optimization

An innovative ML infrastructure named CircuitOps is developed to streamline dataset generation and model inference for various generative AI (GAI)-based circuit optimization tasks. Addressing the challenges of the absence of a shared Intermediate Representation (IR), steep EDA learning curves, and AI-unfriendly data structures, we propose solutions that empower efficient data handling.

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with HyPoradise dataset to learn the mapping from ASR N-best hypotheses to ground-truth transcription by efficient LLM finetuning, which shows great effectiveness but lacks specificity on noise-robust ASR.

A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS

This paper presents an Inverter-based AC-coupled Toggle (ISR-ACT) transceiver targeted for short-reach die-to-die communication over silicon interposer or similar high-density interconnect. The ISR-ACT’s transmitter sends NRZ data through a small on-chip capacitor into the line. The receiver amplifies the low-swing pulses using a 1st-stage TIA to fully toggle the 2nd-stage output, where positive feedback to the input pad maintains the DC level on the line.

Omer Shapira

Omer Shapira joined NVIDIA Research in 2023. His research interests are in Computer Graphics, Virtual and Extended Reality (XR), Human-Computer Interaction and Haptics - with current focus on AI-Mediated Human Interaction.
Since 2016, he worked at NVIDIA's Simulation Technology group, leading the Omniverse XR group and contributing to Isaac, Drivesim, and NVIDIA's core XR technology and research.

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts. This is the goal of semi-supervised learning, which exploits more widely available unlabeled data to complement small labeled data sets. In this paper, we propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.

ATISS: Autoregressive Transformers for Indoor Scene Synthesis

The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation. In this paper, we present ATISS, a novel autoregressive transformer architecture for creating diverse and plausible synthetic indoor environments, given only the room type and its floor plan. In contrast to prior work, which poses scene synthesis as sequence generation, our model generates rooms as unordered sets of objects.

EditGAN: High-Precision Semantic Image Editing

Generative adversarial networks (GANs) have recently found applications in image editing. However, most GAN based image editing methods often require large scale datasets with semantic segmentation annotations for training, only provide high level control, or merely interpolate between different images. Here, we propose EditGAN, a novel method for high quality, high precision semantic image editing, allowing users to edit images by modifying their highly detailed part segmentation masks, e.g., drawing a new mask for the headlight of a car.