GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields

We propose GazeNeRF, a 3D-aware method for the task of gaze redirection. Existing gaze redirection methods operate on 2D images and struggle to generate 3D consistent results. Instead, we build on the intuition that the face region and eye balls are separate 3D structures that move in a coordinated yet independent fashion. Our method leverages recent advancements in conditional image-based neural radiance fields and proposes a two-branch architecture that predicts volumetric features for the face and eye regions separately.

Zero-shot Pose Transfer for Unrigged Stylized 3D Characters

Transferring the pose of a reference avatar to stylized 3D characters of various shapes is a fundamental task in computer graphics. Existing methods either require the stylized characters to be rigged, or they use the stylized character in the desired pose as ground truth at training. We present a zero-shot approach that requires only the widely available deformed non-stylized avatars in training, and deforms stylized characters of significantly different shapes at inference.

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability to generate high-quality images with diverse open-vocabulary language descriptions. This demonstrates that their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP, on the other hand, are good at classifying images into open-vocabulary labels.

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

We present the Group Propagation Vision Transformer (GPViT): a novel non- hierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. High-resolution features (or tokens) are a natural fit for tasks that involve perceiving fine-grained details such as detection and segmentation, but exchanging global information between these features is expensive in memory and computation because of the way self-attention scales. We provide a highly efficient alternative Group Propagation Block (GP Block) to exchange global information.

Less is More: Rendering for Esports

Computer graphics has improved from early wireframes to ray tracing and physically based rendering. Yet nearly 20 years ago, George Lucas stated that “the real leap has been made,” and today, esports players turn off many of the rendering techniques that took SIGGRAPH so long to develop, because they don't help them win. Is it time for SIGGRAPH to reconsider its research goals? This workshop will discuss this question, including alternatives to photorealism, trading off temporal and visual accuracy, and trading off realism with gameplay and fairness.

Mouse Sensitivity in First-person Targeting Tasks

Mouse sensitivity in first-person targeting tasks is a highly debated issue. Recommendations within a single game can vary by a factor of 10x or more and are an active topic of experimentation in both competitive and recreational esports communities. Inspired by work in pointer-based gain optimization and extending our previous results from the first user study focused on mouse sensitivity in first-person targeting tasks [1], we describe a range of optimal mouse sensitivity wherein players perform statistically significantly better in task completion time and throughput.

VaPr: Variable-Precision Tensors to Accelerate Robot Motion Planning

High-dimensional motion generation requires numerical precision for smooth, collision-free solutions. Typically, double-precision or single-precision floating-point (FP) formats are used for accurate results. Using these for big tensors imposes a strain on the memory bandwidth provided by the devices and alters the memory footprint, hence limiting their applicability to low-power edge devices needed for mobile robots. The uniform application of reduced precision can be advantageous but severely degrades solutions.

A distributed, decoupled system for losslessly streaming dynamic light probes to thin clients

We present a networked, high-performance graphics system that combines dynamic, high-quality, ray traced global illumination computed on a server with direct illumination and primary visibility computed on a client. This approach provides many of the image quality benefits of real-time ray tracing on low-power and legacy hardware, while maintaining a low latency response and mobile form factor.

Ryo Hachiuma

Ryo Hachiuma is a Research Scientist at NVIDIA Research Taiwan, working on Multi-Modal AI. He received his Ph.D. degree from Keio University, advised by Prof. Hideo Saito. Before joining NVIDIA Research, he was a computer vision engineer at Konica Minolta, Inc. in Japan, working on human action recognition.  His research interest is mainly in Human activity analysis from multi-sensory data (e.g., audio-visual, audio-visual-language).

 

Ed Schmerling

Ed Schmerling is a Research Scientist in the Autonomous Vehicle Research Group at NVIDIA. His main research interests are in the modeling and development of intelligent data-driven agents through advances in generative modeling, uncertainty quantification, and optimal control with applications in simulation, behavior planning, and safety assurance. Prior to joining NVIDIA, he served as the Associate Director of the Autonomous Systems Laboratory at Stanford University, and has previously worked as an AV researcher at Waymo. He received a Ph.D.