Rowland O'Flaherty

Rowland's interest within robotics lie at the intersection of control theory, machine learning, and optimization. Rowland came to NVIDIA from the Silicon Valley follow-me drone startup world developing algorithms for planning and control, as well as improving the dynamical models of the robotic vehicles to a high degree of fidelity. Before joining the startup world, Rowland earned a B.S. and M.S degree in ECE from UCSB and a Ph.D. in Robotics from Georgia Tech.

Guillermo Marcus

Guillermo Marcus is a senior software engineer at NVIDIA at the intersection of graphics, communications, and machine learning. He is interested in systems engineering, software engineering and applied scientific computing with a focus on accelerated computing and application-specific accelerators. Before joining NVIDIA in 2013, he was a research scientist at the Univeristy of Heidelberg. He received the PRACE Award at ISC 2011. He has a Ph.D. (magna cum laude) from the Heidelberg University in Germany, a M.Sc.

Fabio Ramos

My research is focused on modelling and understanding uncertainty for prediction and decision making tasks, and includes Bayesian statistics, data fusion, anomaly detection, and reinforcement learning. Over the last ten years I have applied these techniques to robotics, mining and exploration, environment monitoring, and neuroscience.

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments

Affordance modeling plays an important role in visual understanding. In this paper, we aim to predict affordances of 3D indoor scenes, specifically what human poses are afforded by a given indoor environment, such as sitting on a chair or standing on the floor. In order to predict valid affordances and learn possible 3D human poses in indoor scenes, we need to understand the semantic and geometric structure of a scene as well as its potential interactions with a human. To learn such a model, a large-scale dataset of 3D indoor affordances is required.

SIDOD: A Synthetic Image Dataset for 3D Object Pose Recognition with Distractors

We present a new image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs generated from 18 camera view points of three photorealistic virtual environments with up to 10 objects (chosen randomly from the 21 object models of the YCB dataset) and flying distractors. Object and camera pose, scene lighting, and quantity of objects and distractors were randomized. Each provided view includes RGB, depth, segmentation, and surface normal images.

Thomas Müller

Thomas is a principal research scientist at NVIDIA working on the intersection of machine learning and (inverse) light transport simulation. His research won multiple best paper awards, was featured in TIME's best inventions of 2022, and is used in movie production, commercial 3D reconstruction and gaming.

Context-aware Captions from Context-agnostic Supervision

We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of "siamese cat" and "tiger cat", we generate language that describes the "siamese cat" in a way that distinguishes it from "tiger cat".

Learning From Noisy Large-Scale Datasets With Minimal Supervision

We present an approach to effectively use millions of images with noisy annotations in conjunction with a small subset of cleanly-annotated images to learn powerful image representations. One common approach to combine clean and noisy data is to first pre-train a network using the large noisy dataset and then fine-tune with the clean dataset. We show this approach does not fully leverage the information contained in the clean set.

Group online adaptive learning

Sharing information among multiple learning agents can accelerate learning. It could be particularly useful if learners operate in continuously changing environments, because a learner could benefit from previous experience of another learner to adapt to their new environment. Such group-adaptive learning has numerous applications, from predicting financial time-series, through content recommendation systems, to visual understanding for adaptive autonomous agents. Here we address the problem in the context of online adaptive learning.

Efficient Generation of Points that Satisfy Two-Dimensional Elementary Intervals

Precomputing high-quality sample points has been shown to be a useful technique for Monte Carlo integration in rendering; doing so allows optimizing properties of the points without the performance constraints of generating samples during rendering. A particularly useful property to incorporate is stratification across elementary intervals, which has been shown to reduce error in Monte Carlo integration. This is a key property of the recently-introduced progressive multi-jittered, pmj02 and pmj02bn points [Christensen et al.