Diffusion models represent a cutting-edge approach to image generation, offering a dynamic framework for capturing temporal changes in data. The UNet encoder within diffusion models has recently been under intense scrutiny, revealing intriguing patterns in feature transformations during inference. These models use an encoder propagation scheme to revolutionize diffusion sampling by reusing past features, enabling efficient parallel processing.
Researchers from Nankai University, Mohamed bin Zayed University of AI, Linkoping University, Harbin Engineering University, Universitat Autonoma de Barcelona examined the UNet encoder in diffusion models. They introduced an encoder propagation scheme and a prior noise injection method to improve image quality. The proposed method preserves structural information effectively, but encoder and decoder dropping fail to achieve complete denoising.
Originally designed for medical image segmentation, UNet has evolved, especially in 3D medical image segmentation. In text-to-image diffusion models like Stable Diffusion (SD) and DeepFloyd-IF, UNet is pivotal in advancing tasks such as image editing, super-resolution, segmentation, and object detection. It proposes an approach to accelerate diffusion models, employing encoder propagation and dropping for efficient sampling. Compared to ControlNet, the proposed method concurrently applies to two encoders, reducing generation time and computational load while maintaining content preservation in text-guided image generation.
Diffusion models, integral in text-to-video and reference-guided image generation, leverage the UNet architecture, comprising an encoder, bottleneck, and decoder. While past research focused on the UNet decoder, it pioneered an in-depth examination of the UNet encoder in diffusion models. It explores changes in encoder and decoder features during inference and introduces an encoder propagation scheme for accelerated diffusion sampling.
The research thoroughly investigates the UNet encoder in diffusion models, revealing gentle changes in encoder features and substantial variations in decoder features during inference. Introducing an encoder propagation scheme, cyclically reusing previous time-step components for the decoder accelerates diffusion sampling and enables parallel processing. A prior noise injection method enhances texture details in generated images. The approach is validated across various tasks, achieving a notable 41% and 24% acceleration in SD and DeepFloyd-IF model sampling while maintaining high-quality generation. A user study confirms the proposed method’s comparable performance to baseline methods through pairwise comparisons with 18 users.
In conclusion, the study conducted can be presented in the following points:
The research pioneers the first comprehensive study of the UNet encoder in diffusion models.
The study examines changes in encoder features during inference.
An innovative encoder propagation scheme accelerates diffusion sampling by cyclically reusing encoder features, allowing for parallel processing.
A noise injection method enhances texture details in generated images.
The approach has been validated across diverse tasks and exhibits significant sampling acceleration for SD and DeepFloyd-IF models without knowledge distillation while maintaining high-quality generation.
The FasterDiffusion code release enhances reproducibility and encourages further research in the field.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.