| Research

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

This paper presents a novel framework to combine multiple layers and modalities of deep neural networks for video classification. We first propose a multilayer strategy to simultaneously capture a variety of levels of abstraction and invariance in a network, where the convolutional and fully connected layers are effectively represented by the proposed feature aggregation methods. We further introduce a multimodal scheme that includes four highly complementary modalities to extract diverse static and dynamic cues at multiple temporal scales.

Read more about Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Deep G-Buffers for Stable Global Illumination Approximation

We introduce a new hardware-accelerated method for constructing Deep G-buffers that is 2x-8x faster than the previous depth peeling method and produces more stable results. We then build several high-performance shading algorithms atop our representation, including dynamic diffuse interreflection, ambient occlusion (AO), and mirror reflection effects.

Our construction method s order-independent, guarantees a minimum separation between layers, operates in a (small) bounded memory footprint, and does not require per-pixel sorting.

Read more about Deep G-Buffers for Stable Global Illumination Approximation

A Phenomenological Scattering Model for Order-Independent Transparency

Translucent objects such as fog, smoke, glass, ice, and liquids are pervasive in cinematic environments because they frame scenes in depth and create visually compelling shots. Unfortunately, they are hard to simulate in real-time and have thus previously been rendered poorly compared to opaque surfaces in games.

This paper introduces the first model for a real-time rasterization algorithm that can simultaneously approximate the following transparency phenomena: wavelength-varying ("colored") transmission, translucent colored shadows, caustics, partial coverage, diffusion, and refraction.

Read more about A Phenomenological Scattering Model for Order-Independent Transparency

Production-Level Facial Performance Capture Using Deep Convolutional Neural Networks

We present a real-time deep learning framework for video-based facial performance capture—the dense 3D tracking of an actor’s face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5–10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that subject.

Read more about Production-Level Facial Performance Capture Using Deep Convolutional Neural Networks

Kai-Hwa Yao

Read more about Kai-Hwa Yao

Stan Birchfield

Stan Birchfield is a Principal Research Scientist and Senior Research Manager, exploring the intersection of computer vision and robotics. Prior to joining NVIDIA, he was a tenured professor at Clemson University, where he led research in computer vision, visual tracking, mobile robotics, robotic manipulation, and the perception of highly deformable objects. He remains an adjunct faculty member at Clemson.

Read more about Stan Birchfield

Thomas Breuel

Read more about Thomas Breuel

Page Placement Strategies for GPUs within Heterogeneous Memory Systems

Systems from smartphones to supercomputers are increasingly heterogeneous, being composed of both CPUs and GPUs. To maximize cost and energy efficiency, these systems will increasingly use globally-addressable heterogeneous memory systems, making choices about memory page placement critical to performance. In this work we show that current page placement policies are not sufficient to maximize GPU performance in these heterogeneous memory systems. We propose two new page placement policies that improve GPU performance: one application agnostic and one using application profile information.

Read more about Page Placement Strategies for GPUs within Heterogeneous Memory Systems

Unlocking Bandwidth for GPUs in CC-NUMA systems

Historically, GPU-based HPC applications have had a substantial memory bandwidth advantage over CPU-based workloads due to using GDDR rather than DDR memory. However, past GPUs required a restricted programming model where application data was allocated up front and explicitly copied into GPU memory before launching a GPU kernel by the programmer. Recently, GPUs have eased this requirement and now can employ on-demand software page migration between CPU and GPU memory to obviate explicit copying.

Read more about Unlocking Bandwidth for GPUs in CC-NUMA systems

Flexible Software Profiling of GPU Architectures

To aid application characterization and architecture design space exploration, researchers and engineers have developed a wide range of tools for CPUs, including simulators, profilers, and binary instrumentation tools. With the advent of GPU computing, GPU manufacturers have developed similar tools leveraging hardware profiling and debugging hooks. To date, these tools are largely limited by the fixed menu of options provided by the tool developer and do not offer the user the flexibility to observe or act on events not in the menu.

Read more about Flexible Software Profiling of GPU Architectures

Subscribe to