1. [Publications](/index.php/publications)
2. Fugatto 1 - Foundational Generative Audio Transformer Opus 1
 
 # Fugatto 1 - Foundational Generative Audio Transformer Opus 1

  ![](/sites/default/files/styles/wide/public/publications/Screenshot%202024-12-03%20at%204.29.01%20PM.png?itok=eq4mXTmw)

 ***Fugatto*** is a versatile audio synthesis and transformation model capable of following
free-form text instructions with optional audio inputs. While large language
models (LLMs) trained with text on a simple next-token prediction objective can
learn to infer instructions directly from the data, models trained solely on audio
data lack this capacity. This is because audio data does not inherently contain the
instructions that were used to generate it. To overcome this challenge, we introduce
a specialized dataset generation approach optimized for producing a wide range of
audio generation and transformation tasks, ensuring the data reveals meaningful
relationships between audio and language. Another challenge lies in achieving
compositional abilities – such as combining, interpolating between, or negating
instructions – using data alone. To address it, we propose ***ComposableAR*****T**, an
inference-time technique that extends classifier-free guidance to compositional
guidance. It enables the seamless and flexible composition of instructions, leading
to highly customizable audio outputs outside the training distribution. Our evaluations
across a diverse set of tasks demonstrate that ***Fugatto*** performs competitively
with specialized models, while ***ComposableART*** enhances its sonic palette and
control over synthesis. Most notably, we highlight our framework’s ability to
synthesize emergent sounds – sonic phenomena that transcend conventional audio
generation – unlocking new creative possibilities. [Demo Website](https://fugatto.github.io/).



 ## Authors



Rafael Valle (NVIDIA)

Rohan Badlani (NVIDIA)

Zhifeng Kong (NVIDIA)

Sang-gil Lee (NVIDIA)

Arushi Goel (NVIDIA)

Sungwon Kim (NVIDIA)

Joao Felipe Santos (NVIDIA)

Shuqi Dai (NVIDIA)

[Siddharth Gururani](/index.php/person/siddharth-gururani)

Aya AIJa'fari (NVIDIA)

Alex Liu (NVIDIA)

Kevin Shih (NVIDIA)

Wei Ping (NVIDIA)

[Huck Yang](/index.php/person/huck-yang)

Bryan Catanzaro (NVIDIA)

 

 

 ## Publication Date



Friday, April 25, 2025

 

 ## Published in



[ICLR 2025](https://openreview.net/forum?id=B2Fqu7Y2cd)

 

 ## Research Area



[Generative AI](/index.php/research-area/generative-ai)

[Natural Language Processing](/index.php/research-area/natural-language-processing)

 

 

 ## External Links



[Demo Website](https://fugatto.github.io/)

 

 

 ## Uploaded Files



[FUGATTO.pdf](https://d1qx31qr3h6wln.cloudfront.net/publications/FUGATTO.pdf "Open file in new window")1.75 MB