Yunsheng Bai

Yunsheng Bai earned his PhD in Computer Science from the University of California, Los Angeles (UCLA) in June 2023, under the guidance of Professors Yizhou Sun and Wei Wang, and in collaboration with Professor Jason Cong's team. His research is dedicated to developing innovative approaches for graph-related tasks, with a keen interest in graph neural networks and large language models applied to graph-level tasks, including graph similarity, graph matching, and HLS design modeling.

Scott Reed

I am a principal research scientist at NVResearch on the Generalist Embodied Agent Research (GEAR) team. My research goal is to build generalist agents that can help humans in the real (including physical) world. Previously I worked on control and generative models at Google DeepMind since 2016. I completed my PhD under Professor Honglak Lee at the University of Michigan in 2016. 

Rachel Luo

Rachel Luo is a Research Scientist in the Autonomous Vehicles Group at NVIDIA Research. She works on improving the safety and reliability of autonomous systems. Her research interests lie at the intersection of machine learning, computer vision, and robotics, and include topics in uncertainty quantification, distribution shift, and foundation models. Prior to joining Nvidia, Rachel completed her PhD and MS in Electrical Engineering at Stanford University, and her BS in Electrical Engineering and Computer Science at MIT.

Yongxin Chen

I am a research scientist at Nvidia. I also hold the position of Associate Professor at Georgia Institute of Technology. I obtained my PhD degree from University of Minnesota in 2016. My research interests encompass machine learning, control theory, optimal transport, optimization, Markov Chain Monte Carlo, and robotics.

Graph Metanetworks for Processing Diverse Neural Architectures

Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging.

ATT3D: Amortized Text-To-3D Object Synthesis

Text-to-3D modeling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization.

Do Action Video Game Players Search Faster Than Non-Players?

Studies have shown that action video game players have enhanced visual abilities in various domains, such as multiple object tracking, size of the useful field of view, and visual search speed and accuracy. These improvements have been attributed to either a general advantage in “learning to learn” abilities, or domain-specific enhancement(s) in the “common demands” between specific games and experimental tasks. To investigate these two theories, we conducted six experiments examining whether and how players and non-players differ in various aspects of visual search.

Is Less More? Rendering for Esports

Computer graphics research has long prioritized image quality over frame rate. Yet demand for an alternative is growing, with many esports players turning off visual effects to improve frame rates. Is it time for graphics researchers to reconsider their goals? A workshop at the 2023 SIGGRAPH Conference explored this question. Three researchers made provocative presentations, each of which were then discussed by dozens of research and industry attendees.

2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision

We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D and 3D feature fusion based on weakly supervised learning is in great demand.

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, ATT3D cannot capture high-frequency geometry and texture details and struggles to scale to large prompt sets, so it generalizes poorly. We introduce Latte3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set.