Search

Shang Yang

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
NVILA: Efficient Frontier Visual Language Models
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.