Yaosheng Fu

Yaosheng Fu joined NVIDIA in September, 2017 as a member of the architecture research team. His current interests include computer architecture, memory systems and parallel computing. Yaosheng received his Ph.D. degree in Electrical Engineering at Princeton University, NJ in 2017 and B.S. in Electronic Engineering at Tsinghua University, China in 2010.

Near-eye Light Field Holographic Rendering with Spherical Waves for Wide Field of View Interactive 3D Computer Graphics

Holograms have high resolution and great depth of field allowing the eye to view a scene much like seeing through a virtual window. Unfortunately, computer generated holography (CGH) does not deliver the same promise due to hardware limitations under plane wave illumination and large computational cost. Light field displays have been popular due to their capability to provide continuous focus cue. However, light field displays suffer from the trade offs between spatial and angular resolution, and do not model diffraction.

Perceptually-Guided Foveation for Light Field Displays

A variety of applications such as virtual reality and immersive cinema require high image quality, low rendering latency, and consistent depth cues. 4D light field displays support focus accommodation, but are more costly to render than 2D images, resulting in higher latency. The human visual system can resolve higher spatial frequencies in the fovea than in the periphery. This property has been harnessed by recent 2D foveated rendering methods to reduce computation cost while maintaining perceptual quality.

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs, especially in mobile platforms such as autonomous vehicles, cameras, and electronic personal assistants. This paper introduces the Sparse CNN (SCNN) accelerator architecture, which improves performance and energy efficiency by exploiting the zero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator.

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

We introduce a novel method to obtain high-quality 3D reconstructions from consumer RGB-D sensors. Our core idea is to simultaneously optimize for geometry encoded in a signed distance field (SDF), textures from automatically-selected keyframes, and their camera poses along with material and scene lighting. To this end, we propose a joint surface reconstruction approach that is based on shape-from-shading (SfS) techniques and utilizes the estimation of spatially-varying spherical harmonics (SVSH) from subvolumes of the reconstructed scene. Through extensive examples and evaluations, we demo

An Efficient Denoising Algorithm for Global Illumination

We propose a hybrid ray-tracing/rasterization strategy for real-time rendering enabled by a fast new denoising method. We factor global illumination into direct light at rasterized primary surfaces and two indirect lighting terms, each estimated with one path-traced sample per pixel. Our factorization enables efficient (biased) reconstruction by denoising light without blurring materials. We demonstrate denoising in under 10 ms per 1280×720 frame, compare results against the leading offline denoising methods, and include a supplement with source code, video, and data.

Aggregate G-Buffer Anti-Aliasing in Unreal Engine 4

In recent years, variants of Temporal Anti-Aliasing (TAA) have become the techniques of choice for fast post-process anti-aliasing, approximating super-sampled AA amortized over multiple frames. While TAA generally greatly improves quality over previous post-process AA algorithms, the approach can also suffer from inherent artifacts, namely ghosting and flickering, in the presence of complex sub-pixel geometry and/or sub-pixel specular highlights. In this talk, we will share our experience from implementing Aggregate G-Buffer Anti-Aliasing (AGAA) in Unreal Engine 4.

The SGGX microflake distribution

We introduce the Symmetric GGX (SGGX) distribution to represent spatially-varying properties of anisotropic microflake participating media. Our key theoretical insight is to represent a microflake distribution by the projected area of the microflakes. We use the projected area to parameterize the shape of an ellipsoid, from which we recover a distribution of normals.

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms

Variation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturing-process- or temperature-induced variation. We empirically demonstrate a new form of variation that exists within a real DRAM chip, induced by the design and placement of different components in the DRAM chip: different regions in DRAM, based on their relative distances from the peripheral structures, require different minimum access latencies for reliable operation.