In generative modeling, diffusion models (DMs) have assumed a pivotal role, facilitating recent progress in producing high-quality picture and video synthesis. Scalability and iterativeness are two of DMs’ main advantages; they enable them to do intricate tasks like picture creation from free-form text cues. Unfortunately, the many sample steps required for the iterative inference process currently hinder the real-time use of DMs. On the other hand, the single-step formulation and intrinsic speed of Generative Adversarial Networks (GANs) distinguish them. However, regarding sample quality, GANs frequently need more DMs despite efforts to expand to massive datasets.
Researchers from Stability AI in this study aim to fuse the innate speed of GANs with the higher sample quality of DMs. Their strategy is straightforward conceptually: The study team suggests Adversarial Diffusion Distillation (ADD), a generic technique that keeps good sampling fidelity and can potentially enhance the model’s overall performance by cutting the number of inference steps of a pre-trained diffusion model to 1-4 sampling steps. The research team combines two training goals: (i) a distillation loss equivalent to score distillation sampling (SDS) with an adversarial loss.
At each forward pass, the adversarial loss encourages the model to produce samples that lie on the manifold of actual pictures directly, eliminating artifacts such as blurriness commonly seen in other distillation techniques. To retain the high compositionality seen in big DMs and make efficient use of the substantial knowledge of the pre-trained DM, the distillation loss employs another pre trained (and fixed) DM as a teacher. Their method further minimizes memory requirements by not utilizing classifier-free guidance during inference. The advantage over earlier one-step GAN-based methods is that the research team may continue to develop the model iteratively and enhance outcomes.
The following is a summary of their contributions:
• The research team presents ADD, a technique that requires just 1-4 sampling steps to convert pretrained diffusion models into high-fidelity, real-time picture generators. The study team carefully considered several design decisions for their unique approach, which combines adversarial training with score distillation.
• ADD-XL outperforms its teacher model SDXL-Base at a resolution of 5122 px using four sampling steps. • ADD can handle complex image compositions while maintaining high realism at only one inference step. • ADD significantly outperforms strong baselines like LCM, LCM-XL, and single-step GANs.
In conclusion, this study introduces a generic technique for distilling a pre-trained diffusion model into a quick, few-step picture-generating model: Adversarial Diffusion Distillation. Utilizing real data through the discriminator and structural knowledge through the diffusion instructor, the research team combines an adversarial and a score distillation aim to distill the public Stable Diffusion and SDXL models. Their analysis shows that their technique beats all concurrent approaches, and it works especially well in the ultra-fast sampling regime of one or two steps. Additionally, the study team can still improve samples through several processes. Their model performs better with four sample steps than popular multi-step generators like IF, SDXL, and OpenMUSE. Their methodology opens up new possibilities for real-time generation using foundation models by enabling the development of high-quality photos in a single step.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.