ICLR2026

ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

Post-training quantization (PTQ) compresses the weights and activations of large language models (LLMs) into low-precision …

Yesheng Liang, Haisheng Chen, Song Han, Zhijian Liu

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Vision-language models (VLMs) could power real-time assistants and autonomous agents, but they face a critical challenge: understanding …

Ruyi Xu, Guangxuan Xiao, Yukang Chen, Liuning He, Kelly Peng, Yao (Jason) Lu, Song Han

Fast-dLLM v2: Efficient Block-Diffusion LLM

Autoregressive (AR) large language models (LLMs) have achieved remarkable performance across a wide range of natural language tasks, …

Chengyue Wu, Hao Zhang, Shuchen Xue, Shizhe Diao, Yonggan Fu, Zhijian Liu, Pavlo Molchanov, Ping Luo, Song Han, Enze Xie

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image …

Zhuoyang Zhang, Luke J. Huang, Chengyue Wu, Shang Yang, Kelly Peng, Yao (Jason) Lu, Song Han

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Diffusion-based large language models (Diffusion LLMs) have shown promise for non-autoregressive text generation. However, the …

Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, Enze Xie