| Research

A Compile-Time Managed Multi-Level Register File Hierarchy

As processors increasingly become power limited, performance improvements will be achieved by rearchitecting systems with energy efficiency as the primary design constraint. While some of these optimizations will be hardware based, combined hardware and software techniques likely will be the most productive. This work redesigns the register file system of a modern throughput processor with a combined hardware and software solution that reduces register file energy without harming system performance.

Read more about A Compile-Time Managed Multi-Level Register File Hierarchy

Sparse Matrix-Vector Multiplication on Multicore and Accelerators

This chapter summarizes recent work on the development of highperformance multicore and accelerator-based implementations of sparse matrix-vector multiplication (SpMV). As an object of study, SpMV is an interesting computation for at least two reasons. First, it appears widely in applications in scienti c and engineering computing, nancial and economic modeling, and information retrieval, among others, and is therefore of great practical interest.

Read more about Sparse Matrix-Vector Multiplication on Multicore and Accelerators

Allocation-oriented Algorithm Design with Application to GPU Computing, Ph.D. Dissertation

The wide data-parallelism of GPU processor design facilitates the execution of many concurrent, fine-grained tasks with unprecedented performance and efficiency. However, contemporary opinion regards GPU architecture as being poorly suited for many parallelizations that impose dynamic dependences between processing elements. This dissertation specifically attends to problems whose efficient solutions require cooperative allocation, i.e., their dependences stem from dynamic data placement within shared structures.

Read more about Allocation-oriented Algorithm Design with Application to GPU Computing, Ph.D. Dissertation

Scalable GPU Graph Traversal

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter.

Read more about Scalable GPU Graph Traversal

Gaussian Process Regression Flow for Analysis of Motion Trajectories

Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation speciï¬cally aimed at matching motion trajectories. We model a trajectory as a continuous dense ï¬‚ow ï¬eld from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data.

Read more about Gaussian Process Regression Flow for Analysis of Motion Trajectories

FCam for Multiple Cameras

Read more about FCam for Multiple Cameras

Metering for Exposure Stacks

When creating a High-Dynamic-Range (HDR) image from a sequence of differently exposed Low-Dynamic-Range (LDR) images, the set of LDR images is usually generated by sampling the space of exposure times with a geometric progression and without explicitly accounting for the distribution of irradiance values of the scene. We argue that this choice can produce sub-optimal results both in terms of the number of acquired pictures and the quality of the resulting HDR image.

Read more about Metering for Exposure Stacks

Improved Dual-Space Bounds for Simultaneous Motion and Defocus Blur

Our previous paper on stochastic rasterization [Laine et al. 2011] presented a method for constructing time and lens bounds to accelerate stochastic rasterization by skipping the costly 5D coverage test. Although the method works for the combined case of simultaneous motion and defocus blur, its efficiency drops when significant amounts of both effects are present. In this paper, we describe a bound computation method that treats time and lens domains in a unified fashion, and yields tight bounds also for the combined case.

Read more about Improved Dual-Space Bounds for Simultaneous Motion and Defocus Blur

GPUs and the Future of Parallel Computing

This article discusses the capabilities of state-of-the art GPU-based high-throughput computing systems and considers the challenges to scaling single-chip parallel-computing systems, highlighting high-impact areas that the computing research community can address. NVIDIA Research is investigating an architecture for a heterogeneous high-performance computing system that seeks to address these challenges.

Read more about GPUs and the Future of Parallel Computing

Optical Image Processing Using Light Modulation Displays

We propose to enhance the capabilities of the human visual system by performing optical image processing directly on an observed scene. Unlike previous work which additively superimposes imagery on a scene, or completely replaces scene imagery with a manipulated version, we perform all manipulation through the use of a light modulation display to spatially filter incoming light.

Read more about Optical Image Processing Using Light Modulation Displays

Subscribe to