Denoising diffusion models (DDMs) have emerged as a powerful class of generative models. A forward diffusion process slowly perturbs the data, while a deep model learns to gradually denoise. Synthesis amounts to solving a differential equation (DE) defined by the learnt model. Solving the DE requires slow iterative solvers for high-quality generation. In this work, we propose Higher-Order Denoising Diffusion Solvers (GENIE): Based on truncated Taylor methods, we derive a novel higher-order solver that significantly accelerates synthesis. Our solver relies on higher-order gradients of the perturbed data distribution, that is, higher-order score functions. In practice, only Jacobian-vector products (JVPs) are required and we propose to extract them from the first-order score network via automatic differentiation. We then distill the JVPs into a separate neural network that allows us to efficiently compute the necessary higher-order terms for our novel sampler during synthesis. We only need to train a small additional head on top of the first-order score network. We validate GENIE on multiple image generation benchmarks and demonstrate that GENIE outperforms all previous solvers. Unlike recent methods that fundamentally alter the generation process in DDMs, our GENIE solves the true generative DE and still enables applications such as encoding and guided sampling.
In DDMs, a diffusion process gradually perturbs the data towards random noise, while a deep neural network learns to denoise. Formally, the problem reduces to learning the score function, i.e., the gradient of the log-density of the perturbed data. The (approximate) inverse of the forward diffusion can be described by an ordinary or a stochastic differential equation (ODE or SDE, respectively), defined by the learned score function, and can therefore be used for generation when starting from random noise.
A crucial drawback of DDMs is that the generative ODE or SDE is typically difficult to solve, due to the complex score function. Therefore, efficient and tailored samplers are required for fast synthesis. In this work, building on the generative ODE, we rigorously derive a novel second-order ODE solver using truncated Taylor methods (TTMs). These higher-order methods require higher-order gradients of the ODE—in our case this includes higher-order gradients of the log-density of the perturbed data, i.e., higher-order score functions. Because such higher-order scores are usually not available, existing works typically use simple first-order solvers or samplers with low accuracy, higher-order methods that rely on suboptimal finite difference or other approximations, or alternative approaches for accelerated sampling. Here, we fundamentally avoid such approximations and directly model the higher-order gradient terms: Importantly, our novel Higher-Order Denoising Diffusion Solver (GENIE) relies on Jacobian-vector products (JVPs) involving second-order scores. We propose to calculate these JVPs by automatic differentiation of the regular learnt first-order scores. For computational efficiency, we then distill the entire higher-order gradient of the ODE, including the JVPs, into a separate neural network. In practice, we only need to add a small head to the first-order score network to predict the components of the higher-order ODE gradient. By directly modeling the JVPs we avoid explicitly forming high-dimensional higher-order scores. Intuitively, the higher-order terms in GENIE capture the local curvature of the ODE and enable larger steps when iteratively solving the generative ODE.
The so-known DDIM solver is simply Euler's method applied to a reparameterization of the Probability Flow ODE. In this work, we apply the second TTM to this re-parameterized ODE, which results in the GENIE scheme (simplified notation; see paper for details)
Intuitively, the higher-order gradient term
Modeling a complex 2D toy distribution: Samples in (b) and (c) are generated via DDIM and GENIE, respectively, with 25 solver steps using the analytical score function of the ground truth distribution.
Learning Higher-Order Derivatives. Regular DDMs learn a model
Our distilled model
We extensively validate GENIE on several popular benchmark datasets for image synthesis, namely, CIFAR-10 (resolution
In contrast to recent methods for accerlerated sampling of DDMs that abandon the ODE/SDE framework, GENIE can readily be combined with techniques such as classifier(-free) guidance (see examples below) and image encoding. These techniques can play an important role in synthesizing photorealistic images from DDMs, as well as for image editing tasks.
Image synthesis with classifier-free guidance for the ImageNet classes Pembroke Welsh Corgi and Streetcar using different numbers of denoising steps during generation.
We also train a high-resolution model on AFHQv2 (subset of cats only). We train a base DDM at resolution
High-resolution images generated with the
The sequence above is generated by randomly traversing the latent space of our GENIE model (using 25 base model and five upsampler evaluations). We are interpolating in the latent space of the base model, and we keep the noise in the upsampler (both latent space and augmentation perturbations) fixed in all frames.
GENIE: Higher-Order Denoising Diffusion Solvers
Tim Dockhorn, Arash Vahdat, Karsten Kreis
Advances in Neural Information Processing Systems, 2022
@inproceedings{dockhorn2022genie,
title={{{GENIE: Higher-Order Denoising Diffusion Solvers}}},
author={Dockhorn, Tim and Vahdat, Arash and Kreis, Karsten},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}