SCOPS: Self-Supervised Co-Part Segmentation

Parts provide a good intermediate representation of objects that is robust with respect to the camera, pose and appearance variations. Existing works on part segmentation is dominated by supervised approaches that rely on large amounts of manual annotations and can not generalize to unseen object categories. We propose a self-supervised deep learning approach for part segmentation, where we devise several loss functions that aids in predicting part segments that are geometrically concentrated, robust to object variations and are also semantically consistent across different object instances.

Latency of 30 ms Benefits First Person Targeting Tasks More Than Refresh Rate Above 60 Hz

In competitive sports, human performance makes the difference between who wins and loses. In some competitive video games (esports), response time is an essential factor of human performance. When the athlete's equipment (computer, input and output device) responds with lower latency, it provides a measurable advantage. In this study, we isolate latency and refresh rate by artificially increasing latency when operating at high refresh rates.

Improved Precision and Recall Metric for Assessing Generative Models

The ability to evaluate the performance of a computational model is a vital requirement for driving algorithm research. This is often particularly difficult for generative models such as generative adversarial networks (GAN) that model a data manifold only specified indirectly by a finite set of training examples. In the common case of image data, the samples live in a high-dimensional embedding space with little structure to help assessing either the overall quality of samples or the coverage of the underlying manifold.

High-Quality Self-Supervised Deep Image Denoising

We describe a novel method for training high-quality image denoising models based on unorganized collections of corrupted images. The training does not need access to clean reference images, or explicit pairs of corrupted images, and can thus be applied in situations where such data is unacceptably expensive or impossible to acquire. We build on a recent technique that removes the need for reference data by employing networks with a "blind spot" in the receptive field, and significantly improve two key aspects: image quality and training efficiency.

PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data

In comparison with person re-identification (ReID), which has been widely studied in the research community, vehicle ReID has received less attention. Vehicle ReID is challenging due to 1) high intra-class variability (caused by the dependency of shape and appearance on viewpoint), and 2) small inter-class variability (caused by the similarity in shape and appearance between vehicles produced by different manufacturers). To address these challenges, we propose a Pose-Aware Multi-Task Re-Identification (PAMTRI) framework. This approach includes two innovations compared with previous methods.

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication.

Image Inpainting for Irregular Holes Using Partial Convolutions

Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness. Post-processing is usually used to reduce such artifacts, but are expensive and may fail. We propose the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels.

Yuke Zhu

Yuke Zhu received his master’s and Ph.D. degrees from Stanford. His Ph.D. thesis centers around closing the perception-action loop to make robot intelligence more generalized and applicable to less-controlled environments. His research lies at the intersection of robotics, machine learning, and computer vision. He develops computational methods of perception and control that give rise to intelligent robot behaviors. Through his work, he aspires to teach robots to understand and interact with the visual world around them.

Haggai Maron

Haggai Maron joined Nvidia Research in October 2019 as a Research Scientist. His main fields of interest are machine learning and shape analysis. In particular, he studies how to apply deep learning to irregular domains (e.g., sets, graphs, point clouds, and surfaces) by leveraging their symmetry structure. 

Haggai completed his MSc and Ph.D. at the Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science, and his BSc in Mathematics and Computer Science from the Hebrew University of Jerusalem.