Toward Low-Flying Autonomous MAV Trail Navigation using Deep Neural Networks for Environmental Awareness

We present a micro aerial vehicle (MAV) system, built with inexpensive off-the-shelf hardware, for autonomously following trails in unstructured, outdoor environments such as forests. The system introduces a deep neural network (DNN) called TrailNet for estimating the view orientation and lateral offset of the MAV with respect to the trail center. The DNN-based controller achieves stable flight without oscillations by avoiding overconfident behavior through a loss function that includes both label smoothing and entropy reward.

Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications

Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been deployed in data centers (potentially for business-critical or industrial applications) and safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles have been increasing in hardware systems, and these can lead to catastrophic failures in DNN systems.

Dieter Fox

Dieter Fox is Senior Director of Robotics Research at Nvidia. His research is in robotics, with strong connections to artificial intelligence, computer vision, and machine learning.  He is currently on partial leave from the University of Washington, where he is a Professor in the Paul G. Allen School of Computer Science & Engineering. At UW, he also heads the UW Robotics and State Estimation Lab. From 2009 to 2011, he was Director of the Intel Research Labs Seattle. Dieter obtained his Ph.D. from the University of Bonn, Germany.  He has published more than 200 technical papers and is the co-author of the textbook "Probabilistic Robotics." He is a Fellow of the IEEE and the AAAI, and he received several best paper awards at major robotics, AI, and computer vision conferences. He was an editor of the IEEE Transactions on Robotics, program co-chair of the 2008 AAAI Conference on Artificial Intelligence, and program chair of the 2013 Robotics: Science and Systems conference.



Main Field of Interest: 

Near-Eye Varifocal Augmented Reality Display using See-Through Screens

We present a new optical design for see-through near-eye displays that is simple, compact, varifocal, and provides a wide field of view with clear peripheral vision and large eyebox. Key to this effort is a novel see-through rear-projection screen. We project an image to the see-through screen using an off-axis path, which is then relayed to the user’s eyes through an on-axis partially-reflective magnifying surface. Converting the off-axis path to a compact on-axis imaging path simplifies the optical design.

Latency Requirements for Foveated Rendering in Virtual Reality

Foveated rendering is a performance optimization based on the well-known degradation of peripheral visual acuity. It reduces computational costs by showing a high-quality image in the user’s central (foveal) vision and a lower quality image in the periphery. Foveated rendering is a promising optimization for Virtual Reality (VR) graphics, and generally requires accurate and low-latency eye tracking to ensure correctness even when a user makes large, fast eye movements such as saccades.

Ben Keller

Ben joined the ASIC & VLSI research group at NVIDIA in 2017 after an internship with the group three years earlier. His research interests include clocking and synchronization, fine-grained adaptive voltage scaling, and improved RTL and VLSI flows for design effort reduction.

Ben received his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California, Berkeley, in 2015 and 2017, respectively.  He completed his B.S. in Engineering at Harvey Mudd College in 2010.

Main Field of Interest: 
Additional Research Areas: 

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360 Sports Videos

Watching a 360 sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements. To relieve the viewer from this "360 piloting" task, we propose "deep 360 pilot" -- a deep learning-based agent for piloting through 360 sports videos automatically. At each frame, the agent observes a panoramic image and has the knowledge of previously selected viewing angles. The task of the agent is to shift the current viewing angle (i.e. action) to the next preferred one (i.e., goal).

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples: Strategically-timed attack: the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. Enchanting attack: the adversary aims at luring the agent to a designated target state.

Unsupervised Image-to-Image Translation Networks

Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets.


Subscribe to Research RSS