Parallel Complexity of Forward and Backward Propagation

We show that the forward and backward propagation can be formulated as a solution of lower and upper triangular systems of equations. For standard feedforward (FNNs) and recurrent neural networks (RNNs) the triangular systems are always block bi-diagonal, while for a general computation graph (directed acyclic graph) they can have a more complex triangular sparsity pattern. We discuss direct and iterative parallel algorithms that can be used for their solution and interpreted as different ways of performing model parallelism. Also, we show that for FNNs and RNNs with k layers and t time steps the backward propagation can be performed in parallel in O(log k) and O(log k log t) steps, respectively. Finally, we outline the generalization of this technique using Jacobians that potentially allows us to handle arbitrary layers.

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size for all epochs, adaptively increases the batch size during the training process. Our method delivers the convergence rate of small batch sizes while achieving performance similar to large batch sizes.

Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand Observations and Continuous Control

In the context of deep learning for robotics, we show effective method of training a real robot to grasp a tiny sphere (1:37cm of diameter), with an original combination of system design choices. We decompose the end-to-end system into a vision module and a closed-loop controller module. The two modules use target object segmentation as their common interface. The vision module extracts information from the robot end-effector camera, in the form of a binary segmentation mask of the target.

On Nearest Neighbors in Non Local Means Denoising

To denoise a reference patch, the Non-Local-Means denoising filter processes a set of neighbor patches. Few Nearest Neighbors (NN) are used to limit the computational burden of the algorithm. Here we show analytically that the NN approach introduces a bias in the denoised patch, and we propose a different neighbors’ collection criterion, named Statistical NN (SNN), to alleviate this issue. Our approach outperforms the traditional one in case of both white and colored noise: fewer SNNs generate images of higher quality, at a lower computational cost.

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features.

Learning Binary Residual Representations for Domain-specific Video Streaming

We study domain-specific video streaming. Specifically, we target a streaming setting where the videos to be streamed from a server to a client are all in the same domain and they have to be compressed to a small size for low-latency transmission. Several popular video streaming services, such as the video game streaming services of GeForce Now and Twitch, fall in this category.

Parallel Jaccard and Related Graph Clustering Techniques

In this paper we propose to generalize Jaccard and related measures, often used as similarity coefficients between two sets. We define Jaccard, Dice-Sorensen and Tversky edge weights on a graph and generalize them to account for vertex weights. We develop an efficient parallel algorithm for computing Jaccard edge and PageRank vertex weights. We highlight that the weights computation can obtain more than 10x speedup on the GPU versus CPU on large realistic data sets.

Learning Affinity via Spatial Propagation Networks

In this paper, we propose spatial propagation networks for learning the affinity matrix for vision tasks. We show that by constructing a row/column linear propagation model, the spatially varying transformation matrix exactly constitutes an affinity matrix that models dense, global pairwise relationships of an image.

Benjamin Eckart

Ben Eckart is a Senior Research Scientist working on topics in 3D Computer Vision. He received his Ph.D. in Robotics at Carnegie Mellon University in 2017, as well as an M.S. in Electrical Engineering, B.S. in Computer Science, and B.S. in Computer Engineering. He was an NVIDIA Graduate Fellow in 2014. His research explores methods to represent and operate on 3D point cloud data and its applications to robotics, computer vision, augmented reality, and autonomous driving.  

Progressive Growing of GANs for Improved Quality, Stability, and Variation

We train generative adversarial networks in a progressive fashion, enabling us to generate high-resolution images with high quality.