Dieter Fox

Dieter Fox is Senior Director of Robotics Research at Nvidia. His research is in robotics, with strong connections to artificial intelligence, computer vision, and machine learning.  He is currently on partial leave from the University of Washington, where he is a Professor in the Paul G. Allen School of Computer Science & Engineering. At UW, he also heads the UW Robotics and State Estimation Lab. From 2009 to 2011, he was Director of the Intel Research Labs Seattle. Dieter obtained his Ph.D. from the University of Bonn, Germany.  He has published more than 200 technical papers and is the co-author of the textbook "Probabilistic Robotics." He is a Fellow of the IEEE and the AAAI, and he received several best paper awards at major robotics, AI, and computer vision conferences. He was an editor of the IEEE Transactions on Robotics, program co-chair of the 2008 AAAI Conference on Artificial Intelligence, and program chair of the 2013 Robotics: Science and Systems conference.



Near-Eye Varifocal Augmented Reality Display using See-Through Screens

We present a new optical design for see-through near-eye displays that is simple, compact, varifocal, and provides a wide field of view with clear peripheral vision and large eyebox. Key to this effort is a novel see-through rear-projection screen. We project an image to the see-through screen using an off-axis path, which is then relayed to the user’s eyes through an on-axis partially-reflective magnifying surface. Converting the off-axis path to a compact on-axis imaging path simplifies the optical design.

Latency Requirements for Foveated Rendering in Virtual Reality

Foveated rendering is a performance optimization based on the well-known degradation of peripheral visual acuity. It reduces computational costs by showing a high-quality image in the user’s central (foveal) vision and a lower quality image in the periphery. Foveated rendering is a promising optimization for Virtual Reality (VR) graphics, and generally requires accurate and low-latency eye tracking to ensure correctness even when a user makes large, fast eye movements such as saccades.

Ben Keller

Ben joined the ASIC & VLSI research group at NVIDIA in 2017 after an internship with the group three years earlier. His research interests include clocking and synchronization, fine-grained adaptive voltage scaling, and improved RTL and VLSI flows for design effort reduction.

Ben received his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California, Berkeley, in 2015 and 2017, respectively.  He completed his B.S. in Engineering at Harvey Mudd College in 2010.

Research Area(s): 

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360 Sports Videos

Watching a 360 sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements. To relieve the viewer from this "360 piloting" task, we propose "deep 360 pilot" -- a deep learning-based agent for piloting through 360 sports videos automatically. At each frame, the agent observes a panoramic image and has the knowledge of previously selected viewing angles. The task of the agent is to shift the current viewing angle (i.e. action) to the next preferred one (i.e., goal).

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples: Strategically-timed attack: the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. Enchanting attack: the adversary aims at luring the agent to a designated target state.

Unsupervised Image-to-Image Translation Networks

Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets.

Multiframe Scene Flow with Piecewise Rigid Motion

We introduce a novel multiframe scene flow approach that jointly optimizes the consistency of the patch appearances and their local rigid motions from RGB-D image sequences. In contrast to the competing methods, we take advantage of an overs-egmentation of the reference frame and robust optimization techniques. We formulate scene flow recovery as a global non-linear least squares problem whichis iteratively solved by a damped Gauss-Newton approach. As a result, we obtain a qualitatively new level of accuracy in RGB-D based scene flow estimation which can potentially run in real-time.

Light Fields for Near-eye Displays

The most important requirement to make any near-eye display successful is to provide a comfortable visual experience. This requirement has many boxes to check: having high resolution and wide field of view, being lightweight, having small form factor, and supporting focus cue. Like 3D TVs and movies, near-eye displays also need to solve the vergence and accommodation conflicts. In current Virtual Reality (VR) displays, the user fixates his focus on the fixed focal plane, and the disparity in the pre-processed content drives the eye to verge and creates a 3D sensation.

The Light Field Stereoscope

Over the last few years, virtual reality has re-emerged as a technology that is now feasible at low cost via inexpensive cellphone components. In particular, advances of high-resolution micro displays, low-latency orientation trackers, and modern GPUs facilitate extremely immersive experiences. To facilitate comfortable long-term experiences and wide-spread user acceptance, however, the vergence-accommodation conflict inherent to all stereoscopic displays will have to be solved.


Subscribe to Research RSS