Efficient AI
Efficient AI
News
Publications
Light
Dark
Automatic
ICLR2025
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
VILA-U is a Unified foundation model that integrates Video, Image, Language understanding and generation. Traditional visual language …
Yecheng Wu
,
Zhuoyang Zhang
,
Junyu Chen
,
Haotian Tang
,
Dacheng Li
,
Yunhao Fang
,
Ligeng Zhu
,
Enze Xie
,
Hongxu Yin
,
Li Yi
,
Song Han
,
Yao (Jason) Lu
PDF
Cite
Code
Project
Model
Demo
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, …
Yukang Chen
,
Fuzhao Xue
,
Dacheng Li
,
Qinghao Hu
,
Ligeng Zhu
,
Xiuyu Li
,
Yunhao Fang
,
Haotian Tang
,
Shang Yang
,
Zhijian Liu
,
Ethan He
,
Hongxu Yin
,
Pavlo Molchanov
,
Jan Kautz
,
Linxi Fan
,
Yuke Zhu
,
Yao (Jason) Lu
,
Song Han
PDF
Cite
Code
Project
Model
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Diffusion models have been proven highly effective at generating high-quality images. However, as these models grow larger, they …
Muyang Li
,
Yujun Lin
,
Zhekai Zhang
,
Tianle Cai
,
Xiuyu Li
,
Junxian Guo
,
Enze Xie
,
Chenlin Meng
,
Jun-Yan Zhu
,
Song Han
PDF
Cite
Code
Project
Video
Demo
Blog
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
FP8 training has emerged as a promising method for improving training efficiency. Existing frameworks accelerate training by applying …
Haocheng Xi
,
Han Cai
,
Ligeng Zhu
,
Yao (Jason) Lu
,
Kurt Keutzer
,
Jianfei Chen
,
Song Han
PDF
Cite
Code
Project
Demo
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. …
Junyu Chen
,
Han Cai
,
Junsong Chen
,
Enze Xie
,
Shang Yang
,
Haotian Tang
,
Muyang Li
,
Yao (Jason) Lu
,
Song Han
PDF
Cite
Code
Project
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Deploying long-context large language models (LLMs) is essential but poses significant computational and memory challenges. Caching all …
Guangxuan Xiao
,
Jiaming Tang
,
Jingwei Zuo
,
Junxian Guo
,
Shang Yang
,
Haotian Tang
,
Yao Fu
,
Song Han
PDF
Cite
Code
Project
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
We introduce Hybrid Autoregressive Transformer (HART), an autoregressive (AR) visual generation model capable of directly generating …
Haotian Tang
,
Yecheng Wu
,
Shang Yang
,
Enze Xie
,
Junsong Chen
,
Junyu Chen
,
Zhuoyang Zhang
,
Han Cai
,
Yao (Jason) Lu
,
Song Han
PDF
Cite
Code
Project
Demo
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096x4096 resolution. Sana can synthesize …
Enze Xie
,
Junsong Chen
,
Junyu Chen
,
Han Cai
,
Haotian Tang
,
Yujun Lin
,
Zhekai Zhang
,
Muyang Li
,
Ligeng Zhu
,
Yao (Jason) Lu
,
Song Han
PDF
Cite
Code
Project
MIT Project
Demo
Cite
×