  Min-Hung Chen  

 



  ![](/sites/default/files/person/headphoto_3_edited.png)

  

 Min-Hung (Steve) Chen is a Staff Research Scientist at [NVIDIA Research Taiwan](https://research.nvidia.com/labs/twn/), working on ***Vision+X Multimodal AI***. He received his Ph.D. degree from Georgia Tech, advised by [Prof. Ghassan AlRegib](https://ghassanalregib.info/) and in collaboration with [Prof. Zsolt Kira](https://www.cc.gatech.edu/~zk15/). Before joining NVIDIA, Min-Hung was working on Biometric Research for [Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/) as a *Research Engineer II* at [Microsoft Azure AI](https://azure.microsoft.com/en-us/overview/ai-platform), and was working on [Edge-AI Research](https://www.mediatek.com/technology/artificial-intelligence) as a *Senior AI Engineer* at [MediaTek](https://www.mediatek.com/), respectively.

Min-Hung's research interest is mainly ***Multimodal AI***, including Vision-Language, 4D (video+depth) Understanding, Efficient Deep Learning, VLA, and [Transformer](https://github.com/cmhungsteve/Awesome-Transformer-Attention). He is also interested in *Learning without Fully Supervision*, including domain adaptation, transfer learning, X-supervised learning, etc. Please visit [his website](https://minhungchen.netlify.app/) for more information.

\[[Personal Website](https://minhungchen.netlify.app/)\] \[[LinkedIn](https://www.linkedin.com/in/chensteven/)\] \[[Twitter](https://twitter.com/CMHungSteven)\]



   Research Area(s)

[Applied Perception](/research-area/applied-perception)

[Artificial Intelligence and Machine Learning ](/research-area/machine-learning-artificial-intelligence)

[Computer Vision](/research-area/computer-vision)

 

 

  

 Main Field of Interest

[Computer Vision](/research-area/computer-vision)

 

  

 Google Scholar

<https://scholar.google.com/citations?user=ovzuxi8AAAAJ>

 

  

 

 

 



 ### Publications

 

### 2025 

[ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning](/publication/2025-12_thinkact-vision-language-action-reasoning-reinforced-visual-latent-planning)

[Chi-Pin Huang](/person/chi-pin-huang), Yueh-Hua Wu, [Min-Hung Chen](/person/min-hung-chen), [Frank Wang](/person/frank-wang), [Fred Yang](/person/fred-yang)



[Neural Information Processing Systems (NeurIPS) 2025](https://arxiv.org/pdf/2507.16815)









[Hymba: A Hybrid-head Architecture for Small Language Models](/publication/2025-04_hymba-hybrid-head-architecture-small-language-models)

Xin Dong, [Yonggan Fu\*](/person/yonggan-fu), Shizhe Diao, [Wonmin Byeon](/person/wonmin-byeon), Zijia Chen, Ameya Sunil Mahabaleshwarkar, Shih-Yang Liu, [Matthijs Van keirsbilck](/person/matthijs-van-keirsbilck), [Min-Hung Chen](/person/min-hung-chen), [Yoshi Nishi](/person/yoshi-nishi), Yingyan Celine Lin, [Jan Kautz](/person/jan-kautz), [Pavlo Molchanov](/person/pavlo-molchanov)



[Hymba - ICLR 2025](https://jankautz.com/publications/Hymba_ICLR25.pdf)



ICLR spotlight paper





[Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation](/publication/2025-02_semantic-prompt-learning-weakly-supervised-semantic-segmentation)

Ci-Siang Lin, Chien-Yi Wang, [Frank Wang](/person/frank-wang), [Min-Hung Chen](/person/min-hung-chen)



[Winter Conference on Applications of Computer Vision (WACV)](https://wacv2025.thecvf.com/)









[Spatio-Temporal Context Prompting for Zero-Shot Action Detection](/index.php/publication/2025-02_spatio-temporal-context-prompting-zero-shot-action-detection)

Wei-Jhe Huang, [Min-Hung Chen](/index.php/person/min-hung-chen), Shang-Hong Lai



[Winter Conference on Applications of Computer Vision (WACV)](https://wacv2025.thecvf.com/)









[CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models](/publication/2025-02_corrfill-enhancing-faithfulness-reference-based-inpainting-correspondence)

Kuan-Hung Liu, Cheng-Kun Yang, [Min-Hung Chen](/person/min-hung-chen), Yu-Lun Liu, Yen-Yu Lin



[Winter Conference on Applications of Computer Vision (WACV)](https://wacv2025.thecvf.com/)









### 2024 

[Diffusion-Reward Adversarial Imitation Learning](/publication/2024-12_diffusion-reward-adversarial-imitation-learning)

Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, [Frank Wang](/person/frank-wang), [Min-Hung Chen](/person/min-hung-chen), Shao-Hua Sun



[Neural Information Processing Systems (NeurIPS)](https://neurips.cc/Conferences/2024)









[DoRA: Weight-Decomposed Low-Rank Adaptation](/publication/2024-07_dora-weight-decomposed-low-rank-adaptation)

Shih-Yang Liu, Chien-Yi Wang, [Hongxu Danny Yin](/person/danny-yin), [Pavlo Molchanov](/person/pavlo-molchanov), [Frank Wang](/person/frank-wang), Kwang-Ting Cheng, [Min-Hung Chen](/person/min-hung-chen)



[International Conference on Machine Learning (ICML) 2024](https://icml.cc/Conferences/2024)









### 2023 

[2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision](/publication/2023-10_2d-3d-interlaced-transformer-point-cloud-segmentation-scene-level-supervision)

Cheng-Kun Yang, [Min-Hung Chen](/person/min-hung-chen), Yung-Yu Chaung, Yen-Yu Lin



[ IEEE/CVF International Conference on Computer Vision (ICCV) 2023](https://iccv2023.thecvf.com/)