Efficient AI
Efficient AI
News
Publications
Light
Dark
Automatic
Junxian Guo
Latest
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
XAttention: Block Sparse Attention with Antidiagonal Scoring
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Cite
×