I am a Senior Research Scientist at NVIDIA Research, working in the Deep Learning & Efficiency Research (DLER) team within LPR. My work focuses on improving the performance and scalability of DNNs, especially LLMs, using techniques such as compression (pruning, quantization, low-rank decomposition, etc.) and neural architecture search.
For more details on me and my work, please visit my website.