基于扩散模型的条件引导图像生成综述

doi:10.3785/j.issn.1008-9497.2023.06.001

浙江大学学报（理学版）

2023, Vol. 50

Issue (6): 651-667 DOI: 10.3785/j.issn.1008-9497.2023.06.001

第26届全国计算机辅助设计与图形学学术会议专题

基于扩散模型的条件引导图像生成综述

刘泽润¹,尹宇飞^1,²,薛文灏^1,³,郭蕊¹,程乐超¹(

)

^1.之江实验室，浙江杭州 311121
^2.中国科学技术大学多媒体计算与通信教育部—微软重点实验室，安徽合肥 230026
^3.西北工业大学自动化学院，陕西西安 710072

A review of conditional image generation based on diffusion models

Zerun LIU¹,Yufei YIN^1,²,Wenhao XUE^1,³,Rui GUO¹,Lechao CHENG¹(

)

^1.Zhejiang Lab，Hangzhou 311121，China
^2.CAS Key Laboratory of GIPAS，University of Science and Technology of China，Hefei 230026，China
^3.School of Automation，Northwestern Polytechnical University，Xi'an 710072，China

全文: PDF(2011 KB)

HTML( 10 )

摘要：

基于人工智能技术的生成内容（artificial intelligence generated content，AIGC）已成为当下的热门话题。在众多生成模型中，扩散模型因其高度可解释的数学特性及高质量和多样性的结果引起广泛关注，在条件引导的图像生成领域已取得显著成果，被广泛应用于电影、游戏、绘画和虚拟现实等领域，在文本引导的图像生成任务中，扩散模型不仅能生成高分辨率的图像，而且能保证生成图像的质量。首先介绍了扩散模型的定义和相关背景，然后重点介绍了扩散模型在条件引导的图像生成领域的发展历程和最新进展，最后探讨了扩散模型面临的挑战和潜在的发展方向，旨在为广大研究人员提供相关领域的研究概况和前沿动态。

关键词： 扩散模型; 条件引导的图像生成; 应用

Abstract:

Artificial intelligence generated content (AIGC) has received significant attention at present. As the numerous generative models proposed, the emerging diffusion model has attracted extensive attention due to its highly interpretable mathematical properties and the ability to generate high-quality and diverse results. Nowadays, diffusion models have achieved remarkable results in the field of condition-guided image generation. This achievement promotes the development of diffusion models in other conditional tasks and has various applications in areas such as movies, games, paintings, and virtual reality. For instance, the diffusion model can generate high-resolution images in text-guided image generation tasks while ensuring the quality of the generated images. In this paper, we first introduce the definition and background of diffusion models. Then, we present a review of the development history and latest progress of conditional image generation based on diffusion models. Finally, we conclude this survey with discussions on challenges and future research directions of diffusion models.

Key words: diffusion model conditional image generation application

收稿日期: 2023-05-10 出版日期: 2023-11-30

CLC:

TP 391.41

通讯作者: 程乐超 E-mail: chenglc@zhejianglab.com

作者简介: 刘泽润（1998—），ORCID： https：//orcid.org/0009-0001-4493-6025，男，硕士研究生，主要从事图像处理研究.

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	刘泽润
	尹宇飞
	薛文灏
	郭蕊
	程乐超

引用本文:

刘泽润,尹宇飞,薛文灏,郭蕊,程乐超. 基于扩散模型的条件引导图像生成综述[J]. 浙江大学学报（理学版）, 2023, 50(6): 651-667.

Zerun LIU,Yufei YIN,Wenhao XUE,Rui GUO,Lechao CHENG. A review of conditional image generation based on diffusion models. Journal of Zhejiang University (Science Edition), 2023, 50(6): 651-667.

链接本文:

https://www.zjujournals.com/sci/CN/10.3785/j.issn.1008-9497.2023.06.001 或 https://www.zjujournals.com/sci/CN/Y2023/V50/I6/651

图1 扩散模型的正向和逆扩散过程of the diffusion model

图2 条件引导图像生成的条件和对应方法

表1 大规模图像文本数据集

表2 不同模型在MS-COCO 256×256数据集上的FID指标

图3 基于检索增强的文本生成图像模型框架

表3 对图像主体进行演绎的图像生成模型的比较

表4 以草图为条件的图像生成模型

图4 常见的布局形式和对应的生成结果

114	HAO Y R， CHI Z W， DONG L， et al. Optimizing Prompts For Text-To-Image Generation［Z］. （2022-12-19）. https：//doi.org/10.48550/arXiv.2212.09611.
115	WITTEVEEN S， ANDREWS M. Investigating Prompt Engineering in Diffusion Models［Z］. （2022-11-21）. https：//doi.org/10.48550/arXiv.2211. 15462.
116	WANG Z J， MONTOYA E， MUNECHIKA D， et al. DiffusionDB： A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models［Z］. （2022-10-26）. https：//doi.org/10.48550/arXiv.2210. 14896.
117	SONG J M， MENG C L， ERMON S. Denoising Diffusion Implicit Models［Z］. （2020-10-06）. https：//doi.org/10.48550/arXiv.2010.02502.
118	LU C， ZHOU Y H， BAO F， et al. DPM-Solver： A Fast Ode Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps［Z］. （2022-06-02）.https：//doi.org/10.48550/arXiv.2206.00927.
119	LU C， ZHOU Y H， BAO F， et al. DPM-Solver++： Fast Solver for Guided Sampling of Diffusion Probabilistic Models［Z］. （2022-11-02）. https：//doi.org/10.48550/arXiv.2211.01095.
120	ZHANG Q S， TAO M L， CHEN Y X. GDDIM： Generalized Denoising Diffusion Implicit Models［Z］.（2022-06-11）. https：//doi.org/10.48550/arXiv.2206. 05564.
121	BAO F， LI C X， ZHU J， et al. Analytic-DPM： An Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models［Z］. （2022-01-17）. https：//doi.org/10.48550/arXiv.2201.06503.
122	LUHMAN E， LUHMAN T. Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed［Z］. （2021-01-07）. https：//doi.org/10.48550/arXiv.2101.02388.
123	SALIMANS T， HO J. Progressive Distillation for Fast Sampling of Diffusion Models［Z］. （2022-02-01）. https：//doi.org/10.48550/arXiv.2202.00512.
124	MENG C L， ROMBACH R， GAO R Q， et al. On distillation of guided diffusion models［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver： IEEE， 2023： 14297-14306. DOI：10.1109/cvpr52729.2023.01374 doi: 10.1109/cvpr52729.2023.01374
125	BAO F， NIE S， XUE K W， et al. All are worth words： A vit backbone for diffusion models［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver： IEEE， 2023： 22669-22679. DOI：10.1109/cvpr52729.2023.02171 doi: 10.1109/cvpr52729.2023.02171
126	PEEBLES W， XIE S. Scalable Diffusion Models with Transformers［Z］. （2022-12-19）.https：//doi.org/10.48550/arXiv.2212.09748.
1	WEI L Y， LEFEBVRE S， KWATRA V， et al. State of the Art in Example-Based Texture Synthesis［R］. Eindhoven： Eurographics Association， 2009： 93-117.
2	HAN C， RISSER E， RAMAMOORTHI R， et al. Multiscale texture synthesis［J］. ACM Transactions on Graphics， 2008， 27（3）： 1-8. DOI：10.1145/1360612.1360650 doi: 10.1145/1360612.1360650
3	MAKTHAL S， ROSS A. Synthesis of iris images using Markov random fields［C］// 2005 13th European Signal Processing Conference. Antalya： IEEE， 2005： 1-4.
4	OSINDERO S， HINTON G E. Modeling image patches with a directed hierarchy of Markov random fields［J］. Advances in Neural Information Processing Systems. 2008， 20：1121-1128.
5	GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［J］. Communications of the ACM， 2020， 63（11）： 139-144. DOI：10.1007/978-3-030-50017-7_10 doi: 10.1007/978-3-030-50017-7_10
6	MIRZA M， OSINDERO S. Conditional Generative Adversarial Nets［Z］. （2014-11-06）. https：//arXiv.org/abs/1411.1784.
7	OORD A V D， KALCHBRENNER N， VINYALS O， et al. Conditional Image Generation with PixelCNN Decoders［Z］. （2016-06-16）. https：//doi.org/10.48550/arXiv.1606.05328.
8	SALIMANS T， KARPATHY A， CHEN X， et al. PixelCNN++： Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications［Z］. （2017-01-19）. https：//doi.org/10.48550/arXiv.1701.05517.
9	KINGMA D P， WELLING M. Auto-Encoding Variational Bayes［Z］. （2013-12-20）. https：//doi.org/10.48550/arXiv.1312.6114.
10	DINH L， KRUEGER D， BENGIO Y. NICE： Non-Linear Independent Components Estimation［Z］. （2014-10-30）. https：//doi.org/10.48550/arXiv.1410. 8516.
11	DINH L， SOHL-DICKSTEIN J， BENGIO S. Density Estimation Using Real NVP［Z］. （2016-05-27）. https：//doi.org/10.48550/arXiv.1605.08803.
12	LECUN Y， CHOPRA S， HADSELL R， et al. A tutorial on energy-based learning［C］//BAKIR G， HOFMAN T， SCHÖLKOPF B. Predicting Structured Data. Cambridge： MIT Press， 2006. doi:10.7551/mitpress/7443.003.0014 doi: 10.7551/mitpress/7443.003.0014
13	NGIAM J， CHEN Z， KOH P W， et al. Learning deep energy models［C］// 28th International Conference on International Conference on Machine Learning. Bellevue： Omnipress， 2011： 1105-1112.
14	HO J， JAIN A， ABBEEL P. Denoising Diffusion Probabilistic Models［Z］. （2020-06-19）. https：//doi.org/10.48550/arXiv.2006.11239.
15	SONG Y， ERMON S. Generative modeling by estimating gradients of the data distribution［C］//Thirty-third Conference on Neural Information Processing Systems（NeurIPS）. Vancouver： NeurIPS， 2019.
16	ZHANG H， XU T， LI H S， et al. Stackgan： Text to photo-realistic image synthesis with stacked generative adversarial networks［C］// 2017 IEEE International Conference on Computer Vision （ICCV）. Venice： IEEE， 2017： 5908-5916. DOI：10. 1109/iccv.2017.629 doi: 10. 1109/iccv.2017.629
17	XU T， ZHANG P C， HUANG Q Y， et al. AttnGan： Fine-grained text to image generation with attentional generative adversarial networks［C］// 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City： IEEE， 2018： 1316-1324. DOI：10.1109/cvpr.2018.00143 doi: 10.1109/cvpr.2018.00143
18	DHARIWAL P， NICHOL A. Diffusion models beat GANs on image synthesis［J］. Advances in Neural Information Processing Systems， 2021， 11： 8780-8794.
19	RAMESH A， PAVLOV M， GOH G， et al. Zero-shot text-to-image generation［C］// International Conference on Machine Learning. Online： PMLR， 2021： 8821-8831.
20	KARRAS T， LAINE S， AILA T. A style-based generator architecture for generative adversarial networks［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach： IEEE， 2019： 4401-4410. DOI：10.1109/cvpr.2019.00453 doi: 10.1109/cvpr.2019.00453
21	WU H H， SEETHARAMAN P， KUMAR K， et al. Wav2clip： Learning robust audio representations from clip［C］// 2022 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Online： IEEE， 2022： 4563-4567. DOI：10.1109/icassp43922.2022.9747669 doi: 10.1109/icassp43922.2022.9747669
22	SONG Y， SOHL-DICKSTEIN J， KINGMA D P， et al. Score-Based Generative Modeling Through Stochastic Differential Equations［Z］. （2020-11-26）. https：//doi.org/10.48550/arXiv.2011.13456.
23	DHARIWAL P， NICHOL A. Diffusion models beat gans on image synthesis［J］. Advances in Neural Information Processing Systems， 2021， 34： 8780-8794.
24	NICHOL A， DHARIWAL P. Improved Denoising Diffusion Probabilistic Models［Z］. （2021-02-18）. https：//arxiv.org/abs/2102.09672.
25	NICHOL A， DHARIWAL P， RAMESH A， et al. GLIDE： Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models［Z］. （2021-12-20）. https：//arxiv.org/abs/2112.10741.
26	RONNEBERGER O， FISCHER P， BROX T. U-Net： Convolutional networks for biomedical image segmentation［C］// 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich： MICCAI， 2015： 234-241. doi:10.1007/978-3-319-24574-4_28 doi: 10.1007/978-3-319-24574-4_28
27	SOHL-DICKSTEIN J， WEISS E， MAHESWARANATHAN N， et al. Deep unsupervised learning using nonequilibrium thermodynamics［C］// 32th International Conference on Machine Learning. Lille： PMLR， 2015： 2256-2265.
28	SONG Y， ERMON S. Improved techniques for training score-based generative models［J］. Advances in Neural Information Processing Systems， 2020， 33： 12438-12448. doi:10.48550/arXiv.2006.09011 doi: 10.48550/arXiv.2006.09011
29	SONG Y， DURKAN C， MURRAY I， et al. Maximum likelihood training of score-based diffusion models［J］. Advances in Neural Information Processing Systems， 2021， 34： 1415-1428.
30	BROCK A， DONAHUE J， SIMONYAN K. Large Scale GAN Training for High Fidelity Natural Image Synthesis［Z］. （2018-09-28）. https：//doi.org/10.48550/arXiv.1809.11096.
31	HO J， SALIMANS T. Classifier-Free Diffusion Guidance［Z］. （2022-07-26）. https：//doi.org/10. 48550/arXiv.2207.12598.
32	ROMBACH R， BLATTMANN A， LORENZ D， et al. High-Resolution Image Synthesis with Latent Diffusion Models［Z］. （2021-12-20）. https：//doi.org/10.48550/arXiv.2112.10752.
33	RAMESH A， DHARIWAL P， NICHOL A， et al. Hierarchical Text-Conditional Image Generation with CLIP Latents［Z］. （2022-04-13）. https：//doi.org/10.48550/arXiv.2204.06125.
34	SAHARIA C， CHAN W， SAXENA S， et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding［Z］. （2022-03-23）. https：//doi.org/10.48550/arXiv.2205.11487.
35	RADFORD A， KIM J W， HALLACY C， et al. Learning transferable visual models from natural language supervision［C］// International Conference on Machine Learning. Online： PMLR， 2021： 8748-8763.
36	LIU X， PARK D H， AZADI S， et al. More control for free image synthesis with semantic diffusion guidance［C］// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Vancouver： IEEE， 2023： 289-299. doi:10.1109/wacv56688.2023.00037 doi: 10.1109/wacv56688.2023.00037
37	AVRAHAMI O， LISCHINSKI D， FRIED O. Blended diffusion for text-driven editing of natural images［C］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. New Orleans： IEEE， 2022： 18187-18197. DOI：10.1109/CVPR52688.2022.01767 doi: 10.1109/CVPR52688.2022.01767
38	KWON M， JEONG J， UH Y. Diffusion Models Already Have a Semantic Latent Space［Z］. （2022-10-20）. https：//doi.org/10.48550/arXiv.2210. 10960.
39	KIM G， KWON T， YE J C. Diffusionclip： Text-guided diffusion models for robust image manipulation［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans： IEEE， 2022： 2426-2435. DOI：10.1109/cvpr52688.2022.00246 doi: 10.1109/cvpr52688.2022.00246
40	GU S Y， CHEN D， BAO J M， et al. Vector quantized diffusion model for text-to-image synthesis［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans： IEEE， 2022： 10696-10706. DOI：10.1109/cvpr52688.2022.01043 doi: 10.1109/cvpr52688.2022.01043
41	HO J， SAHARIA C， CHAN W， et al. Cascaded diffusion models for high fidelity image generation［J］. The Journal of Machine Learning Research， 2022， 23（47）： 1-33.
42	SAHARIA C， HO J， CHAN W， et al. Image super-resolution via iterative refinement［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 45（4）： 4713-4726. DOI：10. 1109/tpami.2022.3204461 doi: 10. 1109/tpami.2022.3204461
43	YOUNG P， LAI A， HODOSH M， et al. From image descriptions to visual denotations： New similarity metrics for semantic inference over event descriptions［J］. Transactions of the Association for Computational Linguistics， 2014， 2： 67-78. DOI：10.1162/tacl_a_00166 doi: 10.1162/tacl_a_00166
44	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft coco： Common objects in context［C］// 13th European Conference on Computer Vision （ECCV）. Zurich： Springer， 2014： 740-755. doi:10.1007/978-3-319-10602-1_48 doi: 10.1007/978-3-319-10602-1_48
45	CHANGPINYO S， SHARMA P， DING N， et al. Conceptual 12M： Pushing web-scale image-text pre-training to recognize long-tail visual concepts［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE， 2021： 3558-3568. DOI：10.1109/cvpr46437.2021.00356 doi: 10.1109/cvpr46437.2021.00356
46	SRINIVASAN K， RAMAN K， CHEN J， et al. Wit： Wikipedia-based image text dataset for multimodal multilingual machine learning［C］// 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Online： ACM， 2021： 2443-2449. DOI：10.1145/3404835. 3463257 doi: 10.1145/3404835. 3463257
47	GU J X， MENG X J， LU G S， et al. Wukong： 100 Million Large-Scale Chinese Cross-Modal Pre-Training Dataset and a Foundation Framework［Z］. （2022-02-14）. https：//doi.org/10.48550/arXiv.2202. 06767.
48	SCHUHMANN C， VENCU R， BEAUMONT R， et al. Laion-400M： Open Dataset of Clip-Filtered 400 Million Image-Text Pairs［Z］. （2021-11-03）. https：//doi.org/10.48550/arXiv.2111.02114.
49	MINWOO B， BEOMHEE P， HAECHEON K， et al. COYO-700M： Image-Text Pair Dataset［Z］. https：//github.com/kakaobrain/coyo-dataset.
50	SCHUHMANN C， BEAUMONT R， VENCU R， et al. Laion-5B： An Open Large-Scale Dataset for Training Next Generation Image-Text Models［Z］. （2022-10-16）. https：//doi.org/10.48550/arXiv.2210. 08402.
51	FENG Z， ZHANG Z， YU X， et al. ERNIE-ViLG 2.0： Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts［Z］. （2022-10-27）. https：//doi.org/10. 48550/ arXiv.2210.15257.
52	BALAJI Y， NAH S， HUANG X， et al. EDiffi： Text-To-Image Diffusion Models with an Ensemble of Expert Denoisers［Z］. （2022-11-02）. https：//doi.org/10.48550/arXiv.2211.01324.
53	HOOGEBOOM E， HEEK J， SALIMANS T. Simple Diffusion： End-to-End Diffusion for High Resolution Images［Z］. （2023-01-26）. https：//doi.org/10.48550/arXiv.2301.11093.
54	ESSER P， ROMBACH R， OMMER B. Taming transformers for high-resolution image synthesis［C］// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE， 2021： 12868-12878. DOI：10.1109/cvpr46437. 2021.01268 doi: 10.1109/cvpr46437. 2021.01268
55	RAFFEL C， SHAZEER N， ROBERTS A， et al. Exploring the limits of transfer learning with a unified text-to-text transformer［J］. The Journal of Machine Learning Research， 2020， 21（1）： 5485-5551.
56	BORJI A. Generated Faces in the Wild： Quantitative Comparison of Stable Diffusion， Midjourney and DALL E2［Z］. （2022-10-02）. https：//doi.org/10. 48550/arXiv.2210.00586.
57	YE H， YANG X， TAKAC M， et al. Improving Text-to-Image Synthesis Using Contrastive Learning［Z］. （2021-07-06）. https：//doi.org/10. 48550/arXiv.2107.02423.
58	ZHANG H， KOH J Y， BALDRIDGE J， et al. Cross-modal contrastive learning for text-to-image generation［C］// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE， 2021： 833-842. doi:10.1109/cvpr46437.2021.00089 doi: 10.1109/cvpr46437.2021.00089
59	ZHOU Y F， ZHANG R Y， CHEN C Y， et al. LAFITE： Towards Language-Free Training for Text-to-Image Generation［Z］. （2021-11-27）. https：//doi.org/10.48550/arXiv.2111.13792.
60	DING M， ZHENG W D， HONG W Y， et al. CogView2： Faster and Better Text-to-Image Generation via Hierarchical Transformers［Z］. （2022-04-28）.https：//doi.org/10.48550/arXiv.2204.14217.
61	GAFNI O， POLYAK A， ASHUAL O， et al. Make-a-scene： Scene-based text-to-image generation with human priors［C］// 17th European Conference on Computer Vision. Israel： Springer， 2022： 89-106. doi:10.1007/978-3-031-19784-0_6 doi: 10.1007/978-3-031-19784-0_6
62	YU J H， XU Y Z， KOH J Y， et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation［Z］. （2022-06-22）. https：//doi.org/10.48550/arXiv.2206.10789.
63	LEE K M， LIU H， RYU M， et al. Aligning Text-To-Image Models Using Human Feedback［Z］. （2023-02-23）. https：//doi.org/10.48550/arXiv.2302. 12192.
64	ZHANG Q S， SONG J M， HUANG X， et al. DiffCollage： Parallel Generation of Large Content with Diffusion Models［Z］. （2023-03-30）. https：//doi.org/10.48550/arXiv.2303.17076.
65	SCHRAMOWSKI P， BRACK M， DEISEROTH B， et al. Safe Latent Diffusion： Mitigating Inappropriate Degeneration in Diffusion Models［Z］. （2022-11-09）. https：//doi.org/10.48550/arXiv.2211.05105.
66	FRIEDRICH F， SCHRAMOWSKI P， BRACK M， et al. Fair Diffusion： Instructing Text-to-Image Generation Models on Fairness［Z］. （2023-02-07）. https：//doi.org/10.48550/arXiv.2302.10893.
67	ZHU Y， WU Y， OLSZEWSKI K， et al. Discrete contrastive diffusion for cross-modal music and image generation［C］// The Eleventh International Conference on Learning Representations. Kigali Rwanda： ICLR， 2023： .
68	LIU N， LI S， DU Y L， et al. Compositional visual generation with composable diffusion models［C］// 17th European Conference on Computer Vision. Israel： Springer， 2022： 423-439. doi:10.1007/978-3-031-19790-1_26 doi: 10.1007/978-3-031-19790-1_26
69	LIEW J H， YAN H， ZHOU D， et al. MagicMix： Semantic Mixing with Diffusion Models［Z］. （2022-10-28）. https：//doi.org/10.48550/arXiv.2210. 16056.
70	MA W D K， LEWIS J P， KLEIJN W B， et al. Directed Diffusion： Direct Control of Object Placement through Attention Guidance［Z］. （2023-02-25）. https：//doi.org/10.48550/arXiv.2302.13153.
71	CHEFER H， ALALUF Y， VINKER Y， et al. Attend-and-Excite： Attention-Based Semantic Guidance for Text-to-Image Diffusion Models［Z］. （2023-01-31）. https：//doi.org/10.48550/arXiv.2301.13826.
72	GRAVE E， JOULIN A， USUNIER N. Improving Neural Language Models with a Continuous Cache［Z］. （2016-12-13）. https：//doi.org/10. 48550/arXiv. 1612.04426.
73	ROMBACH R， BLATTMANN A， OMMER B. Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models［Z］. （2022-07-26）. https：//doi.org/10.48550/arXiv.2207.13038.
74	BLATTMANN A， ROMBACH R， OKTAY K， et al. Retrieval-augmented diffusion models［J］. Advances in Neural Information Processing Systems， 2022， 35： 15309-15324.
75	CHEN W H， HU H X， SAHARIA C， et al. Re-Imagen： Retrieval-Augmented Text-to-Image Generator［Z］. （2022-09-29）. https：//doi.org/10.48550/arXiv.2209.14491.
76	SHEYNIN S， ASHUAL O， POLYAK A， et al. KNN-Diffusion： Image Generation via Large-Scale Retrieval［Z］. （2022-04-06）. https：//doi.org/10.48550/arXiv.2204.02849.
77	GAL R， ALALUF Y， ATZMON Y， et al. An Image is Worth One Word： Personalizing Text-to-Image Generation Using Textual Inversion［Z］. （2022-08-02）. https：// doi.org/10.48550/arXiv.2208.01618.
78	RUIZ N， LI Y， JAMPANI V， et al. DreamBooth： Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation［Z］. （2022-08-25）. https：//doi.org/10.48550/arXiv.2208.12242.
79	DONG Z Y， WEI P X， LIN L. DreamArtist： Towards Controllable One-Shot Text-to-Image Generation via Contrastive Prompt-Tuning［Z］. （2022-11-21）. https：//doi.org/10.48550/arXiv.2211. 11337.
80	KUMARI N， ZHANG B， ZHANG R， et al. Multi-Concept Customization of Text-to-Image Diffusion［Z］. （2022-12-08）. https：//doi.org/10.48550/arXiv.2212.04488.
81	WEI Y， ZHANG Y， JI Z， et al. Elite： Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation［Z］. （2023-02-27）. https：//doi.org/10.48550/arXiv.2302.13848.
82	LIU Z H， FENG R L， ZHU K， et al. Cones： Concept Neurons in Diffusion Models for Customized Generation［Z］. （2023-03-09）. https：//doi.org/10.48550/arXiv.2303.05125.
83	HAN L G， LI Y X， ZHANG H， et al. SVDiff： Compact Parameter Space for Diffusion Fine-Tuning［Z］. （2023-03-20）. https：//doi.org/10.48550/arXiv.2303. 11305.
84	PATASHNIK O， GARIBI D， AZURI I， et al. Localizing Object-Level Shape Variations with Text-to-Image Diffusion Models［Z］. （2023-03-20）. https：//doi.org/10.48550/arXiv.2303.11306.
85	HUANG Z Q， WU T X， JIANG Y M， et al. ReVersion： Diffusion-Based Relation Inversion from Images［Z］. （2023-03-23）. https：//doi.org/10.48550/arXiv. 2303.13495.
86	WANG T F， ZHANG T， ZHANG B， et al. Pretraining is All You Need for Image-to-Image Translation［Z］. （2022-05-25）. https：//doi.org/10. 48550/arXiv.2205.12952.
87	VOYNOV A， ABERMAN K， COHEN-OR D. Sketch-Guided Text-to-Image Diffusion Models［Z］. （2022-11-24）. https：//doi.org/10.48550/arXiv. 2211. 13752.
88	MAUNGMAUNG A， SHING M， MITSUI K， et al. Text-Guided Scene Sketch-to-Photo Synthesis［Z］. （2023-02-14）. https：//doi.org/10.48550/arXiv. 2302. 06883.
89	CHENG S I， CHEN Y J， CHIU W C， et al. Adaptively-realistic image generation from stroke and sketch with diffusion model［C］// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa： IEEE， 2023： 4043-4051. DOI：10. 1109/wacv56688.2023.00404 doi: 10. 1109/wacv56688.2023.00404
90	PENG Y C， ZHAO C Q， XIE H R， et al. DiffFaceSketch： High-Fidelity Face Image Synthesis with Sketch-Guided Latent Diffusion Model［Z］. （2023-02-14）. https：//doi.org/10.48550/arXiv.2302. 06908.
91	CHENG J X， LIANG X， SHI X J， et al. LayoutDiffuse： Adapting Foundational Diffusion Models for Layout-to-Image Generation［Z］. （2023-02-16）. https：//doi.org/10.48550/arXiv.2302.08908.
92	BAR-TAL O， YARIV L， LIPMAN Y， et al. MultiDiffusion： Fusing Diffusion Paths for Controlled Image Generation［Z］. （2023-02-16）. https：//doi.org/10.48550/arXiv.2302.08113.
93	AVRAHAMI O， HAYES T， GAFNI O， et al. SpaText： Spatio-textual representation for controllable image generation［C］// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Vancouver： IEEE， 2023： 18370-18380. DOI：10.1109/CVPR52729.2023.01762 doi: 10.1109/CVPR52729.2023.01762
94	HAM C， HAYS J， LU J， et al. Modulating Pretrained Diffusion Models for Multimodal Image Synthesis［Z］. （2023-02-24）. https：//doi.org/10. 48550/arXiv.2302.12764.
95	YANG L， HUANG Z L， SONG Y， et al. Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training［Z］. （2022-11-21）. https：//doi.org/10.48550/arXiv.2211.11138.
96	LI Y H， LIU H T， WU Q Y， et al. GLIGEN： Open-Set Grounded Text-to-Image Generation［Z］. （2023-01-17）. https：//doi.org/10.48550/arXiv.2301.07093.
97	SARUKKAI V， LI L， MA A， et al. Collage Diffusion［Z］. （2023-03-01）.https：//doi.org/10.48550/ arXiv.2303.00262.
98	ZHANG L， AGRAWALA M. Adding Conditional Control to Text-to-Image Diffusion Models［Z］. （2023-02-10）. https：//doi.org/10.48550/arXiv.2302. 05543.
99	HUANG L H， CHEN D， LIU Y， et al. Composer： Creative and Controllable Image Synthesis with Composable Conditions［Z］. （2023-02-20）. https：//doi.org/10.48550/arXiv.2302.09778.
100	YU J W， WANG Y H， ZHAO C， et al. Freedom： Training-Free Energy-Guided Conditional Diffusion Model［Z］. （2023-03-17）. https：//doi.org/10.48550/arXiv.2303.09833.
101	LUGMAYR A， DANELLJAN M， ROMERO A， et al. RePaint： Inpainting using Denoising Diffusion Probabilistic Models［Z］. （2022-01-24）. https：//doi.org/10.48550/arXiv.2201.09865.
102	LI W B， YU X， ZHOU K， et al. SDM： Spatial Diffusion Model for Large Hole Image Inpainting［Z］. （2022-12-06）. https：//doi.org/10.48550/arXiv.2212. 02963.
103	LI R， TAN R T， CHEONG L F. All in one bad weather removal using architectural search［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle： IEEE， 2020： 3172-3182. DOI：10.1109/cvpr42600.2020.00324 doi: 10.1109/cvpr42600.2020.00324
104	CHEN H T， WANG Y H， GUO T Y， et al. Pre-trained image processing transformer［C］// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville： IEEE， 2021： 12294-12305. DOI：10.1109/cvpr46437.2021.01212 doi: 10.1109/cvpr46437.2021.01212
105	ZHU Y R， WANG T Y， FU X Y， et al. Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver： IEEE， 2023： 21747-21758. DOI：10. 1109/cvpr52729.2023.02083 doi: 10. 1109/cvpr52729.2023.02083
106	KAWAR B， ELAD M， ERMON S， et al. Denoising Diffusion Restoration Models［Z］. （2022-01-27）. https：//doi.org/10.48550/arXiv.2201.11793.
107	WANG Y H， YU J W， ZHANG J. Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model［Z］. （2022-12-01）. https：//doi.org/10.48550/arXiv.2212.00490.
108	SAHARIA C， CHAN W， CHANG H， et al. Palette： Image-to-image diffusion models［C］// ACM SIGGRAPH 2022. Vancouver： ACM， 2022： 1-10. DOI：10.1145/3528233.3530757 doi: 10.1145/3528233.3530757
109	PAN X C， QIN P D， LI Y H， et al. Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models［Z］. （2022-11-20）. https：//doi.org/10.48550/ arXiv.2211.10950.
110	JEONG H， KWON G， YE J C. Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models［Z］. （2023-02-08）. https：//doi.org/10.48550/arXiv.2302.03900.
111	NIKANKIN Y， HAIM N， IRANI M. SinFusion： Training Diffusion Models on a Single Image or Video［Z］. （2022-11-21）. https：//doi.org/10.48550/arXiv.2211.11743.
112	ZHAO Y Q， PANG T Y， DU C， et al. A Recipe for Watermarking Diffusion Models［Z］. （2023-03-17）. https：//doi.org/10.48550/arXiv.2303.10137.

[1]	邱波, 张丰, 杜震洪, 刘仁义, 张书瑜, 范心仪. 一种面向移动终端地理场景点云在线可视化的集成型索引[J]. 浙江大学学报（理学版）, 2019, 46(1): 101-110.
[2]	姜斌, 黄祥志, 杜震洪, 张丰, 刘仁义. 一种动态实时的遥感专题应用系统定制框架[J]. 浙江大学学报（理学版）, 2018, 45(6): 758-764.

Viewed

Full text

Abstract

Cited

Shared

Discussed