Exploiting Budan-Fourier and Vincent’s Theorems for Ray Tracing 3D Bézier Curves

We present a new approach to finding ray–cubic Bézier curve intersections by leveraging recent achievements in polynomial studies. Compared with the state-of-the-art adaptive linearization, it increases performance by 5–50 times, while also improving the accuracy by 1000X. Our algorithm quickly eliminates parts of the curve for which the distance to the given ray is guaranteed to be bigger than a model-specific threshold (maximum curve’s half-width).

Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion

We present a machine learning technique for driving 3D facial animation by audio input in real time and with low latency. Our deep neural network learns a mapping from input waveforms to the 3D vertex coordinates of a face model, and simultaneously discovers a compact, latent code that disambiguates the variations in facial expression that cannot be explained by the audio alone. During inference, the latent code can be used as an intuitive control for the emotional state of the face puppet.

Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination

We introduce a reconstruction algorithm that generates a temporally stable sequence of images from one path-per-pixel global illumination. To handle such noisy input, we use temporal accumulation to increase the effective sample count and spatiotemporal luminance variance estimates to drive a hierarchical, image-space wavelet filter. This hierarchy allows us to distinguish between noise and detail at multiple scales using local luminance variance. Physically based light transport is a long-standing goal for real-time computer graphics.

Generating stratified random lines in a square

When generating a set of uniformly distributed lines through a square, some care is needed to avoid bias in line orientation and position. We present a compact algorithm to generate unbiased uniformly distributed lines from a uniform point set over the unit square.

Exploring and Expanding the Continuum of OIT Algorithms

Order independent transparency (OIT) proves challenging for modern rasterization-based renderers. Rendering without transparency can limit the quality of visual effects, so researchers have proposed various algorithms enabling and approximating OIT. Unfortunately, this work generally has restrictions limiting its applicability. To identify directions for improvement, we performed an in-depth categorization of existing transparency techniques and placed them on a multi-dimensional continuum.

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors

Accelerators, such as GPUs, have proven to be highly successful in reducing execution time and power con- sumption of compute-intensive applications. Even though they are already used pervasively, they are typically supervised by general-purpose CPUs, which results in frequent control flow switches and data transfers as CPUs are handling all communication tasks. However, we observe that accelerators are recently being augmented with peer-to-peer communication capabilities that allow for autonomous traffic sourcing and sinking.

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability

Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moore's law slows down, and the number of transistors per die no longer grows at historical rates, the performance curve of single monolithic GPUs will ultimately plateau. However, the need for higher performing GPUs continues to exist in many domains. To address this need, in this paper we demonstrate that package-level integration of multiple GPU modules to build larger logical GPUs can enable continuous performance scaling beyond Moore's law.

Temporal Ensembling for Semi-Supervised Learning

In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augmentation conditions.

Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications

The available computing resources in modern GPUs are growing with each new generation. However, as many general purpose applications with limited thread-scalability are tuned to take advantage of GPUs, available compute resources might not be optimally utilized. To address this, modern GPUs will need to execute multiple kernels simultaneously. As current generations of GPUs (e.g., NVIDIA Kepler, AMD Radeon) already enable concurrent execution of kernels from the same application, in this paper we address the next logical step: executing multiple concurrent applications in GPUs.

Toggle-aware Compression for GPUs

Memory bandwidth compression can be an effective way to achieve higher system performance and energy efficiency in modern data-intensive applications by exploiting redundancy in data. Prior works studied various data compression techniques to improve both capacity (e.g., of caches and main memory) and bandwidth utilization (e.g., of the on-chip and off-chip interconnects). These works addressed two common shortcomings of compression: (i) compression/decompression overhead in terms of latency, energy, and area, and (ii) hardware complexity to support variable data size.