A review of conditional image generation based on diffusion models

doi:10.3785/j.issn.1008-9497.2023.06.001

Journal of Zhejiang University (Science Edition)

2023, Vol. 50

Issue (6): 651-667 DOI: 10.3785/j.issn.1008-9497.2023.06.001

CCF CAD/CG 2023

A review of conditional image generation based on diffusion models

Zerun LIU¹,Yufei YIN^1,²,Wenhao XUE^1,³,Rui GUO¹,Lechao CHENG¹(

)

^1.Zhejiang Lab，Hangzhou 311121，China
^2.CAS Key Laboratory of GIPAS，University of Science and Technology of China，Hefei 230026，China
^3.School of Automation，Northwestern Polytechnical University，Xi'an 710072，China

Download:

HTML( 10 )

PDF(2011KB)
Export: BibTeX | EndNote (RIS)

Abstract

Artificial intelligence generated content (AIGC) has received significant attention at present. As the numerous generative models proposed, the emerging diffusion model has attracted extensive attention due to its highly interpretable mathematical properties and the ability to generate high-quality and diverse results. Nowadays, diffusion models have achieved remarkable results in the field of condition-guided image generation. This achievement promotes the development of diffusion models in other conditional tasks and has various applications in areas such as movies, games, paintings, and virtual reality. For instance, the diffusion model can generate high-resolution images in text-guided image generation tasks while ensuring the quality of the generated images. In this paper, we first introduce the definition and background of diffusion models. Then, we present a review of the development history and latest progress of conditional image generation based on diffusion models. Finally, we conclude this survey with discussions on challenges and future research directions of diffusion models.

Key words： diffusion model conditional image generation application

Received: 10 May 2023 Published: 30 November 2023

CLC:

TP 391.41

Corresponding Authors: Lechao CHENG E-mail: chenglc@zhejianglab.com

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Zerun LIU
	Yufei YIN
	Wenhao XUE
	Rui GUO
	Lechao CHENG

Cite this article:

Zerun LIU,Yufei YIN,Wenhao XUE,Rui GUO,Lechao CHENG. A review of conditional image generation based on diffusion models. Journal of Zhejiang University (Science Edition), 2023, 50(6): 651-667.

URL:

https://www.zjujournals.com/sci/EN/Y2023/V50/I6/651

基于扩散模型的条件引导图像生成综述

基于人工智能技术的生成内容（artificial intelligence generated content，AIGC）已成为当下的热门话题。在众多生成模型中，扩散模型因其高度可解释的数学特性及高质量和多样性的结果引起广泛关注，在条件引导的图像生成领域已取得显著成果，被广泛应用于电影、游戏、绘画和虚拟现实等领域，在文本引导的图像生成任务中，扩散模型不仅能生成高分辨率的图像，而且能保证生成图像的质量。首先介绍了扩散模型的定义和相关背景，然后重点介绍了扩散模型在条件引导的图像生成领域的发展历程和最新进展，最后探讨了扩散模型面临的挑战和潜在的发展方向，旨在为广大研究人员提供相关领域的研究概况和前沿动态。

关键词： 扩散模型, 条件引导的图像生成, 应用

Fig.1 Forward and reverse diffusion processes

Fig. 2 Condition-based classification of conditional image generation methods

Table 1 Large-scale image-text datasets

Table 2 Comparison of FID on MS-COCO dataset

Fig.3 Framework of text-guided image generation methods based on retrieval enhancement

Table 3 Comparison of models for subject-driven generation

Table 4 The comparison of layout-based image generation methods

Fig. 4 Common layout forms and the corresponding generated results


[114]	HAO Y R， CHI Z W， DONG L， et al. Optimizing Prompts For Text-To-Image Generation［Z］. （2022-12-19）. https：//doi.org/10.48550/arXiv.2212.09611.

[115]	WITTEVEEN S， ANDREWS M. Investigating Prompt Engineering in Diffusion Models［Z］. （2022-11-21）. https：//doi.org/10.48550/arXiv.2211. 15462.

[116]	WANG Z J， MONTOYA E， MUNECHIKA D， et al. DiffusionDB： A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models［Z］. （2022-10-26）. https：//doi.org/10.48550/arXiv.2210. 14896.

[117]	SONG J M， MENG C L， ERMON S. Denoising Diffusion Implicit Models［Z］. （2020-10-06）. https：//doi.org/10.48550/arXiv.2010.02502.

[118]	LU C， ZHOU Y H， BAO F， et al. DPM-Solver： A Fast Ode Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps［Z］. （2022-06-02）.https：//doi.org/10.48550/arXiv.2206.00927.

[119]	LU C， ZHOU Y H， BAO F， et al. DPM-Solver++： Fast Solver for Guided Sampling of Diffusion Probabilistic Models［Z］. （2022-11-02）. https：//doi.org/10.48550/arXiv.2211.01095.

[120]	ZHANG Q S， TAO M L， CHEN Y X. GDDIM： Generalized Denoising Diffusion Implicit Models［Z］.（2022-06-11）. https：//doi.org/10.48550/arXiv.2206. 05564.

[121]	BAO F， LI C X， ZHU J， et al. Analytic-DPM： An Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models［Z］. （2022-01-17）. https：//doi.org/10.48550/arXiv.2201.06503.

[122]	LUHMAN E， LUHMAN T. Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed［Z］. （2021-01-07）. https：//doi.org/10.48550/arXiv.2101.02388.

[123]	SALIMANS T， HO J. Progressive Distillation for Fast Sampling of Diffusion Models［Z］. （2022-02-01）. https：//doi.org/10.48550/arXiv.2202.00512.

[124]	MENG C L， ROMBACH R， GAO R Q， et al. On distillation of guided diffusion models［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver： IEEE， 2023： 14297-14306. DOI：10.1109/cvpr52729.2023.01374 doi: 10.1109/cvpr52729.2023.01374

[125]	BAO F， NIE S， XUE K W， et al. All are worth words： A vit backbone for diffusion models［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver： IEEE， 2023： 22669-22679. DOI：10.1109/cvpr52729.2023.02171 doi: 10.1109/cvpr52729.2023.02171

[126]	PEEBLES W， XIE S. Scalable Diffusion Models with Transformers［Z］. （2022-12-19）.https：//doi.org/10.48550/arXiv.2212.09748.

[1]	WEI L Y， LEFEBVRE S， KWATRA V， et al. State of the Art in Example-Based Texture Synthesis［R］. Eindhoven： Eurographics Association， 2009： 93-117.

[2]	HAN C， RISSER E， RAMAMOORTHI R， et al. Multiscale texture synthesis［J］. ACM Transactions on Graphics， 2008， 27（3）： 1-8. DOI：10.1145/1360612.1360650 doi: 10.1145/1360612.1360650

[3]	MAKTHAL S， ROSS A. Synthesis of iris images using Markov random fields［C］// 2005 13th European Signal Processing Conference. Antalya： IEEE， 2005： 1-4.

[4]	OSINDERO S， HINTON G E. Modeling image patches with a directed hierarchy of Markov random fields［J］. Advances in Neural Information Processing Systems. 2008， 20：1121-1128.

[5]	GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［J］. Communications of the ACM， 2020， 63（11）： 139-144. DOI：10.1007/978-3-030-50017-7_10 doi: 10.1007/978-3-030-50017-7_10

[6]	MIRZA M， OSINDERO S. Conditional Generative Adversarial Nets［Z］. （2014-11-06）. https：//arXiv.org/abs/1411.1784.

[7]	OORD A V D， KALCHBRENNER N， VINYALS O， et al. Conditional Image Generation with PixelCNN Decoders［Z］. （2016-06-16）. https：//doi.org/10.48550/arXiv.1606.05328.

[8]	SALIMANS T， KARPATHY A， CHEN X， et al. PixelCNN++： Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications［Z］. （2017-01-19）. https：//doi.org/10.48550/arXiv.1701.05517.

[9]	KINGMA D P， WELLING M. Auto-Encoding Variational Bayes［Z］. （2013-12-20）. https：//doi.org/10.48550/arXiv.1312.6114.

[10]	DINH L， KRUEGER D， BENGIO Y. NICE： Non-Linear Independent Components Estimation［Z］. （2014-10-30）. https：//doi.org/10.48550/arXiv.1410. 8516.

[11]	DINH L， SOHL-DICKSTEIN J， BENGIO S. Density Estimation Using Real NVP［Z］. （2016-05-27）. https：//doi.org/10.48550/arXiv.1605.08803.

[12]	LECUN Y， CHOPRA S， HADSELL R， et al. A tutorial on energy-based learning［C］//BAKIR G， HOFMAN T， SCHÖLKOPF B. Predicting Structured Data. Cambridge： MIT Press， 2006. doi:10.7551/mitpress/7443.003.0014 doi: 10.7551/mitpress/7443.003.0014

[13]	NGIAM J， CHEN Z， KOH P W， et al. Learning deep energy models［C］// 28th International Conference on International Conference on Machine Learning. Bellevue： Omnipress， 2011： 1105-1112.

[14]	HO J， JAIN A， ABBEEL P. Denoising Diffusion Probabilistic Models［Z］. （2020-06-19）. https：//doi.org/10.48550/arXiv.2006.11239.

[15]	SONG Y， ERMON S. Generative modeling by estimating gradients of the data distribution［C］//Thirty-third Conference on Neural Information Processing Systems（NeurIPS）. Vancouver： NeurIPS， 2019.

[16]	ZHANG H， XU T， LI H S， et al. Stackgan： Text to photo-realistic image synthesis with stacked generative adversarial networks［C］// 2017 IEEE International Conference on Computer Vision （ICCV）. Venice： IEEE， 2017： 5908-5916. DOI：10. 1109/iccv.2017.629 doi: 10. 1109/iccv.2017.629

[17]	XU T， ZHANG P C， HUANG Q Y， et al. AttnGan： Fine-grained text to image generation with attentional generative adversarial networks［C］// 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City： IEEE， 2018： 1316-1324. DOI：10.1109/cvpr.2018.00143 doi: 10.1109/cvpr.2018.00143

[18]	DHARIWAL P， NICHOL A. Diffusion models beat GANs on image synthesis［J］. Advances in Neural Information Processing Systems， 2021， 11： 8780-8794.

[19]	RAMESH A， PAVLOV M， GOH G， et al. Zero-shot text-to-image generation［C］// International Conference on Machine Learning. Online： PMLR， 2021： 8821-8831.

[20]	KARRAS T， LAINE S， AILA T. A style-based generator architecture for generative adversarial networks［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach： IEEE， 2019： 4401-4410. DOI：10.1109/cvpr.2019.00453 doi: 10.1109/cvpr.2019.00453

[21]	WU H H， SEETHARAMAN P， KUMAR K， et al. Wav2clip： Learning robust audio representations from clip［C］// 2022 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Online： IEEE， 2022： 4563-4567. DOI：10.1109/icassp43922.2022.9747669 doi: 10.1109/icassp43922.2022.9747669

[22]	SONG Y， SOHL-DICKSTEIN J， KINGMA D P， et al. Score-Based Generative Modeling Through Stochastic Differential Equations［Z］. （2020-11-26）. https：//doi.org/10.48550/arXiv.2011.13456.

[23]	DHARIWAL P， NICHOL A. Diffusion models beat gans on image synthesis［J］. Advances in Neural Information Processing Systems， 2021， 34： 8780-8794.

[24]	NICHOL A， DHARIWAL P. Improved Denoising Diffusion Probabilistic Models［Z］. （2021-02-18）. https：//arxiv.org/abs/2102.09672.

[25]	NICHOL A， DHARIWAL P， RAMESH A， et al. GLIDE： Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models［Z］. （2021-12-20）. https：//arxiv.org/abs/2112.10741.

[26]	RONNEBERGER O， FISCHER P， BROX T. U-Net： Convolutional networks for biomedical image segmentation［C］// 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich： MICCAI， 2015： 234-241. doi:10.1007/978-3-319-24574-4_28 doi: 10.1007/978-3-319-24574-4_28

[27]	SOHL-DICKSTEIN J， WEISS E， MAHESWARANATHAN N， et al. Deep unsupervised learning using nonequilibrium thermodynamics［C］// 32th International Conference on Machine Learning. Lille： PMLR， 2015： 2256-2265.

[28]	SONG Y， ERMON S. Improved techniques for training score-based generative models［J］. Advances in Neural Information Processing Systems， 2020， 33： 12438-12448. doi:10.48550/arXiv.2006.09011 doi: 10.48550/arXiv.2006.09011

[29]	SONG Y， DURKAN C， MURRAY I， et al. Maximum likelihood training of score-based diffusion models［J］. Advances in Neural Information Processing Systems， 2021， 34： 1415-1428.

[30]	BROCK A， DONAHUE J， SIMONYAN K. Large Scale GAN Training for High Fidelity Natural Image Synthesis［Z］. （2018-09-28）. https：//doi.org/10.48550/arXiv.1809.11096.

[31]	HO J， SALIMANS T. Classifier-Free Diffusion Guidance［Z］. （2022-07-26）. https：//doi.org/10. 48550/arXiv.2207.12598.

[32]	ROMBACH R， BLATTMANN A， LORENZ D， et al. High-Resolution Image Synthesis with Latent Diffusion Models［Z］. （2021-12-20）. https：//doi.org/10.48550/arXiv.2112.10752.

[33]	RAMESH A， DHARIWAL P， NICHOL A， et al. Hierarchical Text-Conditional Image Generation with CLIP Latents［Z］. （2022-04-13）. https：//doi.org/10.48550/arXiv.2204.06125.

[34]	SAHARIA C， CHAN W， SAXENA S， et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding［Z］. （2022-03-23）. https：//doi.org/10.48550/arXiv.2205.11487.

[35]	RADFORD A， KIM J W， HALLACY C， et al. Learning transferable visual models from natural language supervision［C］// International Conference on Machine Learning. Online： PMLR， 2021： 8748-8763.

[36]	LIU X， PARK D H， AZADI S， et al. More control for free image synthesis with semantic diffusion guidance［C］// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Vancouver： IEEE， 2023： 289-299. doi:10.1109/wacv56688.2023.00037 doi: 10.1109/wacv56688.2023.00037

[37]	AVRAHAMI O， LISCHINSKI D， FRIED O. Blended diffusion for text-driven editing of natural images［C］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. New Orleans： IEEE， 2022： 18187-18197. DOI：10.1109/CVPR52688.2022.01767 doi: 10.1109/CVPR52688.2022.01767

[38]	KWON M， JEONG J， UH Y. Diffusion Models Already Have a Semantic Latent Space［Z］. （2022-10-20）. https：//doi.org/10.48550/arXiv.2210. 10960.

[39]	KIM G， KWON T， YE J C. Diffusionclip： Text-guided diffusion models for robust image manipulation［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans： IEEE， 2022： 2426-2435. DOI：10.1109/cvpr52688.2022.00246 doi: 10.1109/cvpr52688.2022.00246

[40]	GU S Y， CHEN D， BAO J M， et al. Vector quantized diffusion model for text-to-image synthesis［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans： IEEE， 2022： 10696-10706. DOI：10.1109/cvpr52688.2022.01043 doi: 10.1109/cvpr52688.2022.01043

[41]	HO J， SAHARIA C， CHAN W， et al. Cascaded diffusion models for high fidelity image generation［J］. The Journal of Machine Learning Research， 2022， 23（47）： 1-33.

[42]	SAHARIA C， HO J， CHAN W， et al. Image super-resolution via iterative refinement［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 45（4）： 4713-4726. DOI：10. 1109/tpami.2022.3204461 doi: 10. 1109/tpami.2022.3204461

[43]	YOUNG P， LAI A， HODOSH M， et al. From image descriptions to visual denotations： New similarity metrics for semantic inference over event descriptions［J］. Transactions of the Association for Computational Linguistics， 2014， 2： 67-78. DOI：10.1162/tacl_a_00166 doi: 10.1162/tacl_a_00166

[44]	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft coco： Common objects in context［C］// 13th European Conference on Computer Vision （ECCV）. Zurich： Springer， 2014： 740-755. doi:10.1007/978-3-319-10602-1_48 doi: 10.1007/978-3-319-10602-1_48

[45]	CHANGPINYO S， SHARMA P， DING N， et al. Conceptual 12M： Pushing web-scale image-text pre-training to recognize long-tail visual concepts［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE， 2021： 3558-3568. DOI：10.1109/cvpr46437.2021.00356 doi: 10.1109/cvpr46437.2021.00356

[46]	SRINIVASAN K， RAMAN K， CHEN J， et al. Wit： Wikipedia-based image text dataset for multimodal multilingual machine learning［C］// 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Online： ACM， 2021： 2443-2449. DOI：10.1145/3404835. 3463257 doi: 10.1145/3404835. 3463257

[47]	GU J X， MENG X J， LU G S， et al. Wukong： 100 Million Large-Scale Chinese Cross-Modal Pre-Training Dataset and a Foundation Framework［Z］. （2022-02-14）. https：//doi.org/10.48550/arXiv.2202. 06767.

[48]	SCHUHMANN C， VENCU R， BEAUMONT R， et al. Laion-400M： Open Dataset of Clip-Filtered 400 Million Image-Text Pairs［Z］. （2021-11-03）. https：//doi.org/10.48550/arXiv.2111.02114.

[49]	MINWOO B， BEOMHEE P， HAECHEON K， et al. COYO-700M： Image-Text Pair Dataset［Z］. https：//github.com/kakaobrain/coyo-dataset.

[50]	SCHUHMANN C， BEAUMONT R， VENCU R， et al. Laion-5B： An Open Large-Scale Dataset for Training Next Generation Image-Text Models［Z］. （2022-10-16）. https：//doi.org/10.48550/arXiv.2210. 08402.

[51]	FENG Z， ZHANG Z， YU X， et al. ERNIE-ViLG 2.0： Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts［Z］. （2022-10-27）. https：//doi.org/10. 48550/ arXiv.2210.15257.

[52]	BALAJI Y， NAH S， HUANG X， et al. EDiffi： Text-To-Image Diffusion Models with an Ensemble of Expert Denoisers［Z］. （2022-11-02）. https：//doi.org/10.48550/arXiv.2211.01324.

[53]	HOOGEBOOM E， HEEK J， SALIMANS T. Simple Diffusion： End-to-End Diffusion for High Resolution Images［Z］. （2023-01-26）. https：//doi.org/10.48550/arXiv.2301.11093.

[54]	ESSER P， ROMBACH R， OMMER B. Taming transformers for high-resolution image synthesis［C］// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE， 2021： 12868-12878. DOI：10.1109/cvpr46437. 2021.01268 doi: 10.1109/cvpr46437. 2021.01268

[55]	RAFFEL C， SHAZEER N， ROBERTS A， et al. Exploring the limits of transfer learning with a unified text-to-text transformer［J］. The Journal of Machine Learning Research， 2020， 21（1）： 5485-5551.

[56]	BORJI A. Generated Faces in the Wild： Quantitative Comparison of Stable Diffusion， Midjourney and DALL E2［Z］. （2022-10-02）. https：//doi.org/10. 48550/arXiv.2210.00586.

[57]	YE H， YANG X， TAKAC M， et al. Improving Text-to-Image Synthesis Using Contrastive Learning［Z］. （2021-07-06）. https：//doi.org/10. 48550/arXiv.2107.02423.

[58]	ZHANG H， KOH J Y， BALDRIDGE J， et al. Cross-modal contrastive learning for text-to-image generation［C］// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE， 2021： 833-842. doi:10.1109/cvpr46437.2021.00089 doi: 10.1109/cvpr46437.2021.00089

[59]	ZHOU Y F， ZHANG R Y， CHEN C Y， et al. LAFITE： Towards Language-Free Training for Text-to-Image Generation［Z］. （2021-11-27）. https：//doi.org/10.48550/arXiv.2111.13792.

[60]	DING M， ZHENG W D， HONG W Y， et al. CogView2： Faster and Better Text-to-Image Generation via Hierarchical Transformers［Z］. （2022-04-28）.https：//doi.org/10.48550/arXiv.2204.14217.

[61]	GAFNI O， POLYAK A， ASHUAL O， et al. Make-a-scene： Scene-based text-to-image generation with human priors［C］// 17th European Conference on Computer Vision. Israel： Springer， 2022： 89-106. doi:10.1007/978-3-031-19784-0_6 doi: 10.1007/978-3-031-19784-0_6

[62]	YU J H， XU Y Z， KOH J Y， et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation［Z］. （2022-06-22）. https：//doi.org/10.48550/arXiv.2206.10789.

[63]	LEE K M， LIU H， RYU M， et al. Aligning Text-To-Image Models Using Human Feedback［Z］. （2023-02-23）. https：//doi.org/10.48550/arXiv.2302. 12192.

[64]	ZHANG Q S， SONG J M， HUANG X， et al. DiffCollage： Parallel Generation of Large Content with Diffusion Models［Z］. （2023-03-30）. https：//doi.org/10.48550/arXiv.2303.17076.

[65]	SCHRAMOWSKI P， BRACK M， DEISEROTH B， et al. Safe Latent Diffusion： Mitigating Inappropriate Degeneration in Diffusion Models［Z］. （2022-11-09）. https：//doi.org/10.48550/arXiv.2211.05105.

[66]	FRIEDRICH F， SCHRAMOWSKI P， BRACK M， et al. Fair Diffusion： Instructing Text-to-Image Generation Models on Fairness［Z］. （2023-02-07）. https：//doi.org/10.48550/arXiv.2302.10893.

[67]	ZHU Y， WU Y， OLSZEWSKI K， et al. Discrete contrastive diffusion for cross-modal music and image generation［C］// The Eleventh International Conference on Learning Representations. Kigali Rwanda： ICLR， 2023： .

[68]	LIU N， LI S， DU Y L， et al. Compositional visual generation with composable diffusion models［C］// 17th European Conference on Computer Vision. Israel： Springer， 2022： 423-439. doi:10.1007/978-3-031-19790-1_26 doi: 10.1007/978-3-031-19790-1_26

[69]	LIEW J H， YAN H， ZHOU D， et al. MagicMix： Semantic Mixing with Diffusion Models［Z］. （2022-10-28）. https：//doi.org/10.48550/arXiv.2210. 16056.

[70]	MA W D K， LEWIS J P， KLEIJN W B， et al. Directed Diffusion： Direct Control of Object Placement through Attention Guidance［Z］. （2023-02-25）. https：//doi.org/10.48550/arXiv.2302.13153.

[71]	CHEFER H， ALALUF Y， VINKER Y， et al. Attend-and-Excite： Attention-Based Semantic Guidance for Text-to-Image Diffusion Models［Z］. （2023-01-31）. https：//doi.org/10.48550/arXiv.2301.13826.

[72]	GRAVE E， JOULIN A， USUNIER N. Improving Neural Language Models with a Continuous Cache［Z］. （2016-12-13）. https：//doi.org/10. 48550/arXiv. 1612.04426.

[73]	ROMBACH R， BLATTMANN A， OMMER B. Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models［Z］. （2022-07-26）. https：//doi.org/10.48550/arXiv.2207.13038.

[74]	BLATTMANN A， ROMBACH R， OKTAY K， et al. Retrieval-augmented diffusion models［J］. Advances in Neural Information Processing Systems， 2022， 35： 15309-15324.

[75]	CHEN W H， HU H X， SAHARIA C， et al. Re-Imagen： Retrieval-Augmented Text-to-Image Generator［Z］. （2022-09-29）. https：//doi.org/10.48550/arXiv.2209.14491.

[76]	SHEYNIN S， ASHUAL O， POLYAK A， et al. KNN-Diffusion： Image Generation via Large-Scale Retrieval［Z］. （2022-04-06）. https：//doi.org/10.48550/arXiv.2204.02849.

[77]	GAL R， ALALUF Y， ATZMON Y， et al. An Image is Worth One Word： Personalizing Text-to-Image Generation Using Textual Inversion［Z］. （2022-08-02）. https：// doi.org/10.48550/arXiv.2208.01618.

[78]	RUIZ N， LI Y， JAMPANI V， et al. DreamBooth： Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation［Z］. （2022-08-25）. https：//doi.org/10.48550/arXiv.2208.12242.

[79]	DONG Z Y， WEI P X， LIN L. DreamArtist： Towards Controllable One-Shot Text-to-Image Generation via Contrastive Prompt-Tuning［Z］. （2022-11-21）. https：//doi.org/10.48550/arXiv.2211. 11337.

[80]	KUMARI N， ZHANG B， ZHANG R， et al. Multi-Concept Customization of Text-to-Image Diffusion［Z］. （2022-12-08）. https：//doi.org/10.48550/arXiv.2212.04488.

[81]	WEI Y， ZHANG Y， JI Z， et al. Elite： Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation［Z］. （2023-02-27）. https：//doi.org/10.48550/arXiv.2302.13848.

[82]	LIU Z H， FENG R L， ZHU K， et al. Cones： Concept Neurons in Diffusion Models for Customized Generation［Z］. （2023-03-09）. https：//doi.org/10.48550/arXiv.2303.05125.

[83]	HAN L G， LI Y X， ZHANG H， et al. SVDiff： Compact Parameter Space for Diffusion Fine-Tuning［Z］. （2023-03-20）. https：//doi.org/10.48550/arXiv.2303. 11305.

[84]	PATASHNIK O， GARIBI D， AZURI I， et al. Localizing Object-Level Shape Variations with Text-to-Image Diffusion Models［Z］. （2023-03-20）. https：//doi.org/10.48550/arXiv.2303.11306.

[85]	HUANG Z Q， WU T X， JIANG Y M， et al. ReVersion： Diffusion-Based Relation Inversion from Images［Z］. （2023-03-23）. https：//doi.org/10.48550/arXiv. 2303.13495.

[86]	WANG T F， ZHANG T， ZHANG B， et al. Pretraining is All You Need for Image-to-Image Translation［Z］. （2022-05-25）. https：//doi.org/10. 48550/arXiv.2205.12952.

[87]	VOYNOV A， ABERMAN K， COHEN-OR D. Sketch-Guided Text-to-Image Diffusion Models［Z］. （2022-11-24）. https：//doi.org/10.48550/arXiv. 2211. 13752.

[88]	MAUNGMAUNG A， SHING M， MITSUI K， et al. Text-Guided Scene Sketch-to-Photo Synthesis［Z］. （2023-02-14）. https：//doi.org/10.48550/arXiv. 2302. 06883.

[89]	CHENG S I， CHEN Y J， CHIU W C， et al. Adaptively-realistic image generation from stroke and sketch with diffusion model［C］// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa： IEEE， 2023： 4043-4051. DOI：10. 1109/wacv56688.2023.00404 doi: 10. 1109/wacv56688.2023.00404

[90]	PENG Y C， ZHAO C Q， XIE H R， et al. DiffFaceSketch： High-Fidelity Face Image Synthesis with Sketch-Guided Latent Diffusion Model［Z］. （2023-02-14）. https：//doi.org/10.48550/arXiv.2302. 06908.

[91]	CHENG J X， LIANG X， SHI X J， et al. LayoutDiffuse： Adapting Foundational Diffusion Models for Layout-to-Image Generation［Z］. （2023-02-16）. https：//doi.org/10.48550/arXiv.2302.08908.

[92]	BAR-TAL O， YARIV L， LIPMAN Y， et al. MultiDiffusion： Fusing Diffusion Paths for Controlled Image Generation［Z］. （2023-02-16）. https：//doi.org/10.48550/arXiv.2302.08113.

[93]	AVRAHAMI O， HAYES T， GAFNI O， et al. SpaText： Spatio-textual representation for controllable image generation［C］// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Vancouver： IEEE， 2023： 18370-18380. DOI：10.1109/CVPR52729.2023.01762 doi: 10.1109/CVPR52729.2023.01762

[94]	HAM C， HAYS J， LU J， et al. Modulating Pretrained Diffusion Models for Multimodal Image Synthesis［Z］. （2023-02-24）. https：//doi.org/10. 48550/arXiv.2302.12764.

[95]	YANG L， HUANG Z L， SONG Y， et al. Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training［Z］. （2022-11-21）. https：//doi.org/10.48550/arXiv.2211.11138.

[96]	LI Y H， LIU H T， WU Q Y， et al. GLIGEN： Open-Set Grounded Text-to-Image Generation［Z］. （2023-01-17）. https：//doi.org/10.48550/arXiv.2301.07093.

[97]	SARUKKAI V， LI L， MA A， et al. Collage Diffusion［Z］. （2023-03-01）.https：//doi.org/10.48550/ arXiv.2303.00262.

[98]	ZHANG L， AGRAWALA M. Adding Conditional Control to Text-to-Image Diffusion Models［Z］. （2023-02-10）. https：//doi.org/10.48550/arXiv.2302. 05543.

[99]	HUANG L H， CHEN D， LIU Y， et al. Composer： Creative and Controllable Image Synthesis with Composable Conditions［Z］. （2023-02-20）. https：//doi.org/10.48550/arXiv.2302.09778.

[100]	YU J W， WANG Y H， ZHAO C， et al. Freedom： Training-Free Energy-Guided Conditional Diffusion Model［Z］. （2023-03-17）. https：//doi.org/10.48550/arXiv.2303.09833.

[101]	LUGMAYR A， DANELLJAN M， ROMERO A， et al. RePaint： Inpainting using Denoising Diffusion Probabilistic Models［Z］. （2022-01-24）. https：//doi.org/10.48550/arXiv.2201.09865.

[102]	LI W B， YU X， ZHOU K， et al. SDM： Spatial Diffusion Model for Large Hole Image Inpainting［Z］. （2022-12-06）. https：//doi.org/10.48550/arXiv.2212. 02963.

[103]	LI R， TAN R T， CHEONG L F. All in one bad weather removal using architectural search［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle： IEEE， 2020： 3172-3182. DOI：10.1109/cvpr42600.2020.00324 doi: 10.1109/cvpr42600.2020.00324

[104]	CHEN H T， WANG Y H， GUO T Y， et al. Pre-trained image processing transformer［C］// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE， 2021： 12294-12305. DOI：10.1109/cvpr46437.2021.01212 doi: 10.1109/cvpr46437.2021.01212

[105]	ZHU Y R， WANG T Y， FU X Y， et al. Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver： IEEE， 2023： 21747-21758. DOI：10. 1109/cvpr52729.2023.02083 doi: 10. 1109/cvpr52729.2023.02083

[106]	KAWAR B， ELAD M， ERMON S， et al. Denoising Diffusion Restoration Models［Z］. （2022-01-27）. https：//doi.org/10.48550/arXiv.2201.11793.

[107]	WANG Y H， YU J W， ZHANG J. Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model［Z］. （2022-12-01）. https：//doi.org/10.48550/arXiv.2212.00490.

[108]	SAHARIA C， CHAN W， CHANG H， et al. Palette： Image-to-image diffusion models［C］// ACM SIGGRAPH 2022. Vancouver： ACM， 2022： 1-10. DOI：10.1145/3528233.3530757 doi: 10.1145/3528233.3530757

[109]	PAN X C， QIN P D， LI Y H， et al. Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models［Z］. （2022-11-20）. https：//doi.org/10.48550/ arXiv.2211.10950.

[110]	JEONG H， KWON G， YE J C. Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models［Z］. （2023-02-08）. https：//doi.org/10.48550/arXiv.2302.03900.

[111]	NIKANKIN Y， HAIM N， IRANI M. SinFusion： Training Diffusion Models on a Single Image or Video［Z］. （2022-11-21）. https：//doi.org/10.48550/arXiv.2211.11743.

[112]	ZHAO Y Q， PANG T Y， DU C， et al. A Recipe for Watermarking Diffusion Models［Z］. （2023-03-17）. https：//doi.org/10.48550/arXiv.2303.10137.

[1]	QIU Bo, ZHANG Feng, DU Zhenhong, LIU Renyi, ZHANG Shuyu, FAN Xinyi. An integrated index online visualization of geo-scene point clouds on mobiles[J]. Journal of Zhejiang University (Science Edition), 2019, 46(1): 101-110.

Viewed

Full text

Abstract

Cited

Shared

Discussed