NVIDIA Research Taiwan
NVIDIA Research Taiwan
Home
News
Members
Research
Publications
Contact
Light
Dark
Automatic
Cross-Modal Alignment
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
We introduce DeSTA2.5-Audio, a general-purpose Large Audio Language Model (LALM) designed for robust auditory perception and instruction-following. Recent LALMs augment Large Language Models (LLMs) with auditory capabilities by training on …
Cite
×