| Research

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360 Sports Videos

Watching a 360 sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements. To relieve the viewer from this "360 piloting" task, we propose "deep 360 pilot" -- a deep learning-based agent for piloting through 360 sports videos automatically. At each frame, the agent observes a panoramic image and has the knowledge of previously selected viewing angles. The task of the agent is to shift the current viewing angle (i.e. action) to the next preferred one (i.e., goal).

Read more about Deep 360 Pilot: Learning a Deep Agent for Piloting through 360 Sports Videos

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples: Strategically-timed attack: the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. Enchanting attack: the adversary aims at luring the agent to a designated target state.

Read more about Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

Unsupervised Image-to-Image Translation Networks

Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets.

Read more about Unsupervised Image-to-Image Translation Networks

Multiframe Scene Flow with Piecewise Rigid Motion

We introduce a novel multiframe scene flow approach that jointly optimizes the consistency of the patch appearances and their local rigid motions from RGB-D image sequences. In contrast to the competing methods, we take advantage of an overs-egmentation of the reference frame and robust optimization techniques. We formulate scene flow recovery as a global non-linear least squares problem whichis iteratively solved by a damped Gauss-Newton approach. As a result, we obtain a qualitatively new level of accuracy in RGB-D based scene flow estimation which can potentially run in real-time.

Read more about Multiframe Scene Flow with Piecewise Rigid Motion

Light Fields for Near-eye Displays

The most important requirement to make any near-eye display successful is to provide a comfortable visual experience. This requirement has many boxes to check: having high resolution and wide field of view, being lightweight, having small form factor, and supporting focus cue. Like 3D TVs and movies, near-eye displays also need to solve the vergence and accommodation conflicts. In current Virtual Reality (VR) displays, the user fixates his focus on the fixed focal plane, and the disparity in the pre-processed content drives the eye to verge and creates a 3D sensation.

Read more about Light Fields for Near-eye Displays

The Light Field Stereoscope

Over the last few years, virtual reality has re-emerged as a technology that is now feasible at low cost via inexpensive cellphone components. In particular, advances of high-resolution micro displays, low-latency orientation trackers, and modern GPUs facilitate extremely immersive experiences. To facilitate comfortable long-term experiences and wide-spread user acceptance, however, the vergence-accommodation conflict inherent to all stereoscopic displays will have to be solved.

Read more about The Light Field Stereoscope

Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form

In this paper we focus on the linear algebra theory behind feedforward (FNN) and recurrent (RNN) neural networks. We review backward propagation, including backward propagation through time (BPTT). Also, we obtain a new exact expression for Hessian, which represents second order effects. We show that for t time steps the weight gradient can be expressed as a rank-t matrix, while the weight Hessian is as a sum of t2 Kronecker products of rank-1 and WTAW matrices, for some matrix A and weight matrix W.

Read more about Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form

Benjamin Klenk

Benjamin Klenk joined NVIDIA research in July 2017 after completing his PhD studies in Computer Engineering at the Ruprecht-Karls University in Heidelberg, Germany (defended with summa cum laude). His thesis focused on communication models for direct inter-GPU communication in which the GPU orchestrates the network traffic itself without the intervention of the CPU. He also holds a B.Eng. in Electrial Engineering from the Cooperative State Universtiy Mosbach, Germany and a M.Sc.

Read more about Benjamin Klenk

Michael Stengel

Michael joined Nvidia as a Research Scientist in 2017. His research interests include next generation augmented and virtual reality technology, perceptual real-time rendering, and hyperscale graphics systems. He obtained his PhD in 2016 from TU Braunschweig, Germany. His previous research on computational and augmented/virtual reality displays spans eye tracking, gaze-contingent and perceptual rendering, telepresence technology and immersive human-computer interaction at TU Braunschweig, TU Delft and VU Medical Center, Amsterdam.