Cascaded Scene Flow Prediction using Semantic Segmentation

Given two consecutive frames from a pair of stereo cameras, 3D scene flow methods simultaneously estimate the 3D geometry and motion of the observed scene. Many existing approaches use superpixels for regularization, but may predict inconsistent shapes and motions inside rigidly moving objects. We instead assume that scenes consist of foreground objects rigidly moving in front of a static background, and use semantic cues to produce pixel-accurate scene flow estimates.

Semantic Video CNNs through Representation Warping

In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network representations across time.

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. In this paper, we propose and evaluate new BLAS-like primitives that are capable of performing a wide range of tensor contractions on CPU and GPU efficiently. We begin by focusing on single- index contractions involving all the possible configurations of second-order and third-order tensors.

Low Communication FMM-Accelerated FFT on GPUs

Communication-avoiding algorithms have been a subject of growing interest in the last decade due to the growth of distributed memory systems and the disproportionate increase of computational throughput to communication bandwidth. For distributed 1D FFTs, communication costs quickly dominate execution time as all industry-standard implementations perform three all-to-all transpositions of the data. In this work, we reformulate an existing algorithm that employs the Fast Multipole Method to reduce the communication requirements to approximately a single all-to-all transpose.

Toward Low-Flying Autonomous MAV Trail Navigation using Deep Neural Networks for Environmental Awareness

We present a micro aerial vehicle (MAV) system, built with inexpensive off-the-shelf hardware, for autonomously following trails in unstructured, outdoor environments such as forests. The system introduces a deep neural network (DNN) called TrailNet for estimating the view orientation and lateral offset of the MAV with respect to the trail center. The DNN-based controller achieves stable flight without oscillations by avoiding overconfident behavior through a loss function that includes both label smoothing and entropy reward.

Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications

Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been deployed in data centers (potentially for business-critical or industrial applications) and safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles have been increasing in hardware systems, and these can lead to catastrophic failures in DNN systems.

Dieter Fox

Dieter Fox is Senior Director of Robotics Research at Nvidia. His research is in robotics, with strong connections to artificial intelligence, computer vision, and machine learning.  He is currently on partial leave from the University of Washington, where he is a Professor in the Paul G. Allen School of Computer Science & Engineering. At UW, he also heads the UW Robotics and State Estimation Lab. From 2009 to 2011, he was Director of the Intel Research Labs Seattle. Dieter obtained his Ph.D. from the University of Bonn, Germany.  He has published more than 200 technical papers and is the co-author of the textbook "Probabilistic Robotics." He is a Fellow of the IEEE and the AAAI, and he received several best paper awards at major robotics, AI, and computer vision conferences. He was an editor of the IEEE Transactions on Robotics, program co-chair of the 2008 AAAI Conference on Artificial Intelligence, and program chair of the 2013 Robotics: Science and Systems conference.



Main Field of Interest: 

Near-Eye Varifocal Augmented Reality Display using See-Through Screens

We present a new optical design for see-through near-eye displays that is simple, compact, varifocal, and provides a wide field of view with clear peripheral vision and large eyebox. Key to this effort is a novel see-through rear-projection screen. We project an image to the see-through screen using an off-axis path, which is then relayed to the user’s eyes through an on-axis partially-reflective magnifying surface. Converting the off-axis path to a compact on-axis imaging path simplifies the optical design.

Latency Requirements for Foveated Rendering in Virtual Reality

Foveated rendering is a performance optimization based on the well-known degradation of peripheral visual acuity. It reduces computational costs by showing a high-quality image in the user’s central (foveal) vision and a lower quality image in the periphery. Foveated rendering is a promising optimization for Virtual Reality (VR) graphics, and generally requires accurate and low-latency eye tracking to ensure correctness even when a user makes large, fast eye movements such as saccades.


Subscribe to Research RSS