Efficient Incoherent Ray Traversal on GPUs Through Compressed Wide BVHs

We present a GPU-based ray traversal algorithm that operates on compressed wide BVHs and maintains the traversal stack in a compressed format. Our method reduces the amount of memory traffic significantly, which translates to 1.9-2.1x improvement in incoherent ray traversal performance compared to the current state of the art. Furthermore, the memory consumption of our hierarchy is 35-60% of a typical uncompressed BVH.

Membrane AR: Varifocal, Wide-Field-of-View Augmented Reality Display from Deformable Membranes

Some of the fundamental limitations of existing near-eye displays (NEDs) for augmented reality (AR) are limited field of view (FOV), low angular resolution, and fixed focal state. Optimizing a design to overcome one of these limitations typically leads to a trade-off in the others. This project introduces an all-in-one solution: deformable beamsplitters. It is a new new hybrid hardware design created by combining hyperbolic half-silvered mirrors and deformable membrane mirrors (DMMs).

Wide Field Of View Varifocal Near-Eye Display Using See-Through Deformable Membrane Mirrors

Accommodative depth cues, a wide field of view, and ever-higher resolutions all present major hardware design challenges for near-eye displays. Optimizing a design to overcome one of these challenges typically leads to a trade-off in the others. We tackle this problem by introducing an all-in-one solution - a new wide field of view, gaze-tracked near-eye display for augmented reality applications. The key component of our solution is the use of a single see-through, varifocal deformable membrane mirror for each eye reflecting a display.

Parallel Modularity Clustering

In this paper we develop a parallel approach for computing the modularity clustering often used to identify and analyse communities in social networks. We show that modularity can be approximated by looking at the largest eigenpairs of the weighted graph adjacency matrix that has been perturbed by a rank one update. Also, we generalize this formulation to identify multiple clusters at once. We develop a fast parallel implementation for it that takes advantage of the Lanczos eigenvalue solver and k-means algorithm on the GPU.

Stretchable Transducers for Kinesthetic Interactions in Virtual Reality

The tools of soft robotics enable immersive kinesthetic experiences in virtual reality. Using fluidic elastomer actuators (FEAs), we demonstrate a soft skin that can provide force feedback and a soft controller to simulate different textures and materials. These novel input devices integrate with a VR Funhouse experience.

Hybrid Modulation for Near Zero Display Latency

Binary displays for virtual reality can achieve low latency by integrating view tracking with modulation. We present a novel modulation scheme that combines tracking, pulse density modulation, and pulse width modulation to minimize grayscale artifacts. The hybrid modulator is applied to an AMOLED display at an update rate of 1.7 kHz on which we observe nearly zero latency in the perceived image.

Architecting an Energy-Efficient DRAM System for GPUs

This paper proposes an energy-efficient, high-throughput DRAM architecture for GPUs and throughput processors. In these systems, requests from thousands of concurrent threads compete for a limited number of DRAM row buffers. As a result, only a fraction of the data fetched into a row buffer is used, leading to significant energy overheads. Our proposed DRAM architecture exploits the hierarchical organization of a DRAM bank to reduce the minimum row activation granularity.

Pruning Convolutional Neural Networks for Resource Efficient Inference

We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation, a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks.

Network Endpoint Congestion Control for Fine-Grained Communication

Endpoint congestion in HPC networks creates tree saturation that is detrimental to performance. Endpoint congestion can be alleviated by reducing the injection rate of traffic sources, but requires fast reaction time to avoid congestion buildup. Congestion control becomes more challenging as application communication shift from traditional two-sided model to potentially fine-grained, one-sided communication embodied by various global address space programming models.