I am a principal research scientist and research lead at the Learning and Perception Research Group, NVIDIA Research. Before joining NVIDIA, I obtained Ph.D. in ECE from Carnegie Mellon University in 2017, and M.Phil. in ECE from The Hong Kong University of Science and Technology in 2012. I graduated with a bachelor's degree from the Union Class of Electrical Engineering (FENG Bingquan Pilot Class), South China University of Technology in 2008.
I am interested in building general autonomy and intelligence across both virtual and physical domains. My recent focus lies in Vision Transformers, LLMs, multimodal LLMs, and vision-language-action (VLA) models, with applications spanning open-world understanding, reasoning, AV/robot perception-planning, and agentic systems. I have led or contributed to numerous flagship research efforts and products at NVIDIA, including SegFormer (Most Influential NeurIPS Papers, Demo), VoxFormer, FB-BEV/FB-OCC, (CVPR23 3D Occ Pred Challenge winner, video), Hydra-MDP (CVPR24 E2E Driving Challenge winner, video), the Eagle VLM project, Nemotron, Llama-Nemotron-VL, and GR00T N1/GR00T N1.5 (NVIDIA’s foundation models for humanoid robots). I also participated in designing NVIDIA’s next-generation end-to-end autonomous driving system. My works are characterized by state-of-the-art performance, scalable architectures, and data-centric strategies towards real-world generalization.
Honors and AwardsWinner, CVPR 2024 Challenge on End-to-End Driving at Scale2nd Place, CVPR 2024 Challenge on Driving with LanguageWinner, CVPR 2023 Challenge on 3D Occupancy PredictionWinner, ECCV 2022 Robust Vision Challenge (RVC) on Semantic SegmentationWinner, CVPR 2018 Autonomous Driving Challenge (WAD) on Domain Adaptation2nd Place, ICMI 2015 EmotiW Challenge on Static Facial Expression RecognitionBest Paper Award, BMVC 2020Best Paper Award, WACV 2015Best Student Paper Award, ISCSLP 2014
For more information, please visit my Homepage.