Do Action Video Game Players Search Faster Than Non-Players?

Studies have shown that action video game players have enhanced visual abilities in various domains, such as multiple object tracking, size of the useful field of view, and visual search speed and accuracy. These improvements have been attributed to either a general advantage in “learning to learn” abilities, or domain-specific enhancement(s) in the “common demands” between specific games and experimental tasks. To investigate these two theories, we conducted six experiments examining whether and how players and non-players differ in various aspects of visual search.

Is Less More? Rendering for Esports

Computer graphics research has long prioritized image quality over frame rate. Yet demand for an alternative is growing, with many esports players turning off visual effects to improve frame rates. Is it time for graphics researchers to reconsider their goals? A workshop at the 2023 SIGGRAPH Conference explored this question. Three researchers made provocative presentations, each of which were then discussed by dozens of research and industry attendees.

2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision

We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D and 3D feature fusion based on weakly supervised learning is in great demand.

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, ATT3D cannot capture high-frequency geometry and texture details and struggles to scale to large prompt sets, so it generalizes poorly. We introduce Latte3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set.

Fred Yang

My name is Fu-En Yang (Fred). I am a Research Scientist at NVIDIA Research Taiwan. My research interests involve transfer learning, large vision-language models (LVLMs), multimodal understanding & reasoning, VLM agents, and video modeling. I completed my Ph.D. from National Taiwan University (NTU) in 2023 under the supervision of Prof. Yu-Chiang Frank Wang, and was also a research intern at NVIDIA Research Taiwan.

Jae-Hyun Jung

Jae-Hyun Jung joined NVIDIA Research in March 2024, exploring the interaction between human perception and computational systems. His recent research interests include applied human perception and modeling in walking or driving behavior, visual perception and optical devices in AR/VR, and Human Computer Interaction applications.

Generating images of rare concepts using pre-trained diffusion models

Text-to-image diffusion models can synthesize high-quality images, but they have various limitations. Here we highlight a common failure mode of these models, namely, generating uncommon concepts and structured concepts like hand palms. We show that their limitation is partly due to the long-tail nature of their training data: web-crawled data sets are strongly unbalanced, causing models to under-represent concepts from the tail of the distribution. We characterize the effect of unbalanced training data on text-to-image models and offer a remedy.

Equivariant Architectures for Learning in Deep Weight Spaces

Designing machine learning architectures for processing neural networks in their raw weight matrix form is a newly introduced research direction. Unfortunately, the unique symmetry structure of deep weight spaces makes this design very challenging. If successful, such architectures would be capable of performing a wide range of intriguing tasks, from adapting a pre-trained network to a new domain to editing objects represented as functions (INRs or NeRFs). As a first step towards this goal, we present here a novel network architecture for learning in deep weight spaces.