| Research

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms

Variation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturing-process- or temperature-induced variation. We empirically demonstrate a new form of variation that exists within a real DRAM chip, induced by the design and placement of different components in the DRAM chip: different regions in DRAM, based on their relative distances from the peripheral structures, require different minimum access latencies for reliable operation.

Read more about Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms

Exploiting Budan-Fourier and Vincent’s Theorems for Ray Tracing 3D Bézier Curves

We present a new approach to finding ray–cubic Bézier curve intersections by leveraging recent achievements in polynomial studies. Compared with the state-of-the-art adaptive linearization, it increases performance by 5–50 times, while also improving the accuracy by 1000X. Our algorithm quickly eliminates parts of the curve for which the distance to the given ray is guaranteed to be bigger than a model-specific threshold (maximum curve’s half-width).

Read more about Exploiting Budan-Fourier and Vincent’s Theorems for Ray Tracing 3D Bézier Curves

Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion

We present a machine learning technique for driving 3D facial animation by audio input in real time and with low latency. Our deep neural network learns a mapping from input waveforms to the 3D vertex coordinates of a face model, and simultaneously discovers a compact, latent code that disambiguates the variations in facial expression that cannot be explained by the audio alone. During inference, the latent code can be used as an intuitive control for the emotional state of the face puppet.

Read more about Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion

Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination

We introduce a reconstruction algorithm that generates a temporally stable sequence of images from one path-per-pixel global illumination. To handle such noisy input, we use temporal accumulation to increase the effective sample count and spatiotemporal luminance variance estimates to drive a hierarchical, image-space wavelet filter. This hierarchy allows us to distinguish between noise and detail at multiple scales using local luminance variance. Physically based light transport is a long-standing goal for real-time computer graphics.

Read more about Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination

Generating stratified random lines in a square

When generating a set of uniformly distributed lines through a square, some care is needed to avoid bias in line orientation and position. We present a compact algorithm to generate unbiased uniformly distributed lines from a uniform point set over the unit square.

Read more about Generating stratified random lines in a square

Exploring and Expanding the Continuum of OIT Algorithms

Order independent transparency (OIT) proves challenging for modern rasterization-based renderers. Rendering without transparency can limit the quality of visual effects, so researchers have proposed various algorithms enabling and approximating OIT. Unfortunately, this work generally has restrictions limiting its applicability. To identify directions for improvement, we performed an in-depth categorization of existing transparency techniques and placed them on a multi-dimensional continuum.

Read more about Exploring and Expanding the Continuum of OIT Algorithms

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors

Accelerators, such as GPUs, have proven to be highly successful in reducing execution time and power con- sumption of compute-intensive applications. Even though they are already used pervasively, they are typically supervised by general-purpose CPUs, which results in frequent control flow switches and data transfers as CPUs are handling all communication tasks. However, we observe that accelerators are recently being augmented with peer-to-peer communication capabilities that allow for autonomous traffic sourcing and sinking.

Read more about Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability

Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moore's law slows down, and the number of transistors per die no longer grows at historical rates, the performance curve of single monolithic GPUs will ultimately plateau. However, the need for higher performing GPUs continues to exist in many domains. To address this need, in this paper we demonstrate that package-level integration of multiple GPU modules to build larger logical GPUs can enable continuous performance scaling beyond Moore's law.

Read more about MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability

Temporal Ensembling for Semi-Supervised Learning

In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augmentation conditions.

Read more about Temporal Ensembling for Semi-Supervised Learning

Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications

The available computing resources in modern GPUs are growing with each new generation. However, as many general purpose applications with limited thread-scalability are tuned to take advantage of GPUs, available compute resources might not be optimally utilized. To address this, modern GPUs will need to execute multiple kernels simultaneously. As current generations of GPUs (e.g., NVIDIA Kepler, AMD Radeon) already enable concurrent execution of kernels from the same application, in this paper we address the next logical step: executing multiple concurrent applications in GPUs.

Read more about Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications

Subscribe to