Zekun Hao

Zekun Hao is a research scientist at NVIDIA. He received his Ph.D. in Computer Science from Cornell University, advised by Prof. Serge Belongie. His research interests include machine learning and its applications to computer vision and computer graphics, with a focus on 3D generative models. He is recipient of the 2022-2023 NVIDIA Graduate Fellowship. He received his B.S.

IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research

Graph neural networks (GNNs) have shown high potential for a variety of real-world, challenging applications, but one of the major obstacles in GNN research is the lack of large-scale flexible datasets. Most existing public datasets for GNNs are relatively small, which limits the ability of GNNs to generalize to unseen data. The few existing large-scale graph datasets provide very limited labeled data. This makes it difficult to determine if the GNN model's low accuracy for unseen data is inherently due to insufficient training data or if the model failed to generalize.

Monte Carlo Gradient Quantization

We propose Monte Carlo methods to leverage both sparsity and quantization to compress gradients of neural networks throughout training. On top of reducing the communication exchanged between multiple workers in a distributed setting, we also improve the computational efficiency of each worker. Our method, called Monte Carlo Gradient Quantization (MCGQ), shows faster convergence and higher performance than existing quantization methods on image classification and language modeling.

Instant Quantization of neural networks using Monte Carlo Methods

We propose Monte Carlo methods to leverage both sparsity and quantization to compress gradients of neural networks throughout training. On top of reducing the communication exchanged between multiple workers in a distributed setting, we also improve the computational efficiency of each worker. Our method, called Monte Carlo Gradient Quantization (MCGQ), shows faster convergence and higher performance than existing quantization methods on image classification and language modeling.

Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices

We demonstrate that 1x1-convolutions in 1D time-channel separable convolutions may be replaced by constant, sparse random ternary matrices with weights in {−1, 0, +1}. Such layers do not perform any multiplications and do not require training. Moreover, the matrices may be generated on the chip during computation and therefore do not require any memory access. With the same parameter budget, we can afford deeper and more expressive models, improving the Pareto frontiers of existing models on several tasks.

GPU-Accelerated Partially Linear Multiuser Detection for 5G and Beyond URLLC Systems

We have implemented a recently proposed partially linear multiuser detection algorithm in reproducing kernel Hilbert spaces (RKHSs) on a GPU-accelerated platform. Our proof of concept combines the robustness of linear detection and non-linear detection for the non-orthogonal multiple access (NOMA) based massive connectivity scenario. Mastering the computation of the vast number of inner products (which involve kernel evaluations) is a challenge in ultra-low latency (ULL) applications due to the sub-millisecond latency requirement.

Artificial Neural Networks generated by Low Discrepancy Sequences

Abstract Artificial neural networks can be represented by paths. Generated as random walks on a dense network graph, we find that the resulting sparse networks allow for deterministic initialization and even weights with fixed sign. Such networks can be trained sparse from scratch, avoiding the expensive procedure of training a dense network and compressing it afterwards. Although sparse, weights are accessed as contiguous blocks of memory.