Koki Nagano

Koki Nagano is a principal research scientist at Nvidia Research. He works at the intersection of Graphics and AI with focus on achieving realistic digital humans using data-driven techniques and deep learning. He worked on a 3D display that allows an interactive conversation with a holographic projection of Holocaust survivors to preserve visual archives of the testimonies for future classrooms.

Meshlet Priors for 3D Mesh Reconstruction

Estimating a mesh from an unordered set of sparse, noisy 3D points is a challenging problem that requires to carefully select priors. Existing hand-crafted priors, such as smoothness regularizers, impose an undesirable trade-off between attenuating noise and preserving local detail. Recent deep-learning approaches produce impressive results by learning priors directly from the data. However, the priors are learned at the object level, which makes these algorithms class-specific and even sensitive to the pose of the object.

Bi3D: Stereo Depth Estimation via Binary Classifications

Stereo-based depth estimation is a cornerstone of computer vision, with state-of-the-art methods delivering accurate results in real time. For several applications such as autonomous navigation, however, it may be useful to trade accuracy for lower latency. We present Bi3D, a method that estimates depth via a series of binary classifications. Rather than testing if objects are at a particular depth D, as existing stereo methods do, it classifies them as being closer or farther than D. This property offers a powerful mechanism to balance accuracy and latency.

ꟻLIP: A Difference Evaluator for Alternating Images

Abstract: Image quality measures are becoming increasingly important in the field of computer graphics. For example, there is currently a major focus on generating photorealistic images in real time by combining path tracing with denoising, for which such quality assessment is integral. We present ꟻLIP, which is a difference evaluator with a particular focus on the differences between rendered images and corresponding ground truths. Our algorithm produces a map that approximates the difference perceived by humans when alternating between two images.

Gal Dalal

Gal Dalal is Senior Research Scientist working on Reinforcement Learning (RL) theory and applications at NVIDIA Research.  Previously, he co-founded Amooka-AI, which later became Ford Motor Company’s L3 driving policy team. He obtained his BSc in EE from Technion, Israel, summa cum laude, and his PhD from Technion as a recipient of the IBM fellowship. Gal interned at Google DeepMind and IBM Research, and received the 2019 AAAI Best (“outstanding”) Paper Award, ranked 1st among 1150 accepted papers.

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs

GPUs accelerate high-throughput applications, which require orders-of-magnitude higher memory bandwidth than traditional CPU-only systems. However, the capacity of such high-bandwidth memory tends to be relatively small. Buddy Compression is an architecture that makes novel use of compression to utilize a larger buddy-memory from the host or disaggregated memory, effectively increasing the memory capacity of the GPU.

Near-Memory Data Transformation for Efficient Sparse Matrix Multi-Vector Multiplication

Efficient manipulation of sparse matrices is critical to a wide range of HPC applications. Increasingly, GPUs are used to accelerate these sparse matrix operations. We study one common operation, Sparse Matrix Multi-Vector Multiplication (SpMM), and evaluate the impact of the sparsity, distribution of non-zero elements, and tiletraversal strategies on GPU implementations.

DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis

Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased.