Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation

We investigate two crucial and closely related aspects of CNNs for optical flow estimation: models and training. First, we design a compact but effective CNN model, called PWC-Net, according to simple and well-established principles: pyramidal processing, warping, and cost volume processing. PWC-Net is 17 times smaller in size, 2 times faster in inference, and 11\% more accurate on Sintel final than the recent FlowNet2 model. It is the winning entry in the optical flow competition of the robust vision challenge. Next, we experimentally analyze the sources of our performance gains.

Unsupervised Stylish Image Description Generation via Domain Layer Norm

Most of the existing works on image description focus on generating expressive descriptions. The only few works that are dedicated to generating stylish (e.g., romantic, lyric, etc.) descriptions suffer from limited style variation and content digression. To address these limitations, we propose a controllable stylish image description generation model. It can learn to generate stylish image descriptions that are more related to image content and can be trained with the arbitrary monolingual corpus without collecting new paired image and stylish descriptions.

A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator Designed with a High-Productivity VLSI Methodology

This work presents a scalable deep neural network (DNN) inference accelerator consisting of 36 small chips connected in a mesh network on a multi-chip-module (MCM). The accelerator enables flexible scaling for efficient inference on a wide range of DNNs, from mobile to data center domains. The testchip was implemented using a novel high-productivity VLSI methodology, fully designed in C++ using High-Level Synthesis (HLS) tools and leveraged an agile VLSI design flow.

Reading Speed Decreases for Fast Readers Under Gaze-Contingent Rendering

Gaze-contingent rendering and display could help meet the increasing resolution and frame rate demands of modern displays while reducing the required latency, bandwidth, and power. However, it is still unclear how degradation of the peripheral image impacts behavior, particularly for the important task of reading. We examined changes in reading speed with different levels of peripheral degradation, varying the size of the text, foveal region, and sub-sampling kernel.

Legate NumPy: Accelerated and Distributed Array Computing

NumPy is a popular Python library used for performing array-based numerical computations. The canonical implementation of NumPy used by most programmers runs on a single CPU core and only a few operations are parallelized across cores. This restriction to single-node CPU-only execution limits both the size of data that can be processed and the speed with which problems can be solved.

Dynamic Diffuse Global Illumination with Ray-Traced Irradiance Fields

We show how to compute global illumination efficiently in scenes with dynamic objects and lighting. We extend classic irradiance probes to a compact encoding of the full irradiance field in a scene. First, we compute the dynamic irradiance field using an efficient GPU memory layout, geometric ray tracing, and appropriate sampling rates without down-sampling or filtering prohibitively-large spherical textures. Second, we devise a robust filtered irradiance query, using a novel visibility-aware moment-based interpolant.

Extreme View Synthesis

We present Extreme View Synthesis, a solution for novel view extrapolation that works even when the number of input images is small--as few as two. In this context, occlusions and depth uncertainty are two of the most pressing issues, and worsen as the degree of extrapolation increases. We follow the traditional paradigm of performing depth-based warping and refinement, with a few key improvements. First, we estimate a depth probability volume, rather than just a single depth value for each pixel of the novel view.

Few-Shot Adaptive Gaze Estimation

Inter-personal anatomical differences limit the accuracy of person-independent gaze estimation networks. Yet there is a need to lower gaze errors further to enable applications requiring higher quality. Further gains can be achieved by personalizing gaze networks, ideally with few calibration samples. However, over-parameterized neural networks are not amenable to learning from few examples as they can quickly over-fit.