计算机与控制工程 |
|
|
|
|
基于生成对抗网络的文本两阶段生成高质量图像方法 |
曹寅1,2( ),秦俊平1,2,*( ),高彤2,3,马千里1,2,任家琪1,2 |
1. 内蒙古工业大学 数据科学与应用学院,内蒙古 呼和浩特 010051 2. 内蒙古自治区基于大数据的软件服务工程技术研究中心,内蒙古 呼和浩特 010000 3. 北京工业大学 信息学部,北京 100124 |
|
Generative adversarial network based two-stage generation of high-quality images from text |
Yin CAO1,2( ),Junping QIN1,2,*( ),Tong GAO2,3,Qianli MA1,2,Jiaqi REN1,2 |
1. College of Data Science and Applications, Inner Mongolia University of Technology, Hohhot 010051, China 2. Inner Mongolia Autonomous Region Engineering Technology Research Center of Big Data Based Software Service, Hohhot 010000, China 3. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China |
引用本文:
曹寅,秦俊平,高彤,马千里,任家琪. 基于生成对抗网络的文本两阶段生成高质量图像方法[J]. 浙江大学学报(工学版), 2024, 58(4): 674-683.
Yin CAO,Junping QIN,Tong GAO,Qianli MA,Jiaqi REN. Generative adversarial network based two-stage generation of high-quality images from text. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 674-683.
链接本文:
https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.04.003
或
https://www.zjujournals.com/eng/CN/Y2024/V58/I4/674
|
1 |
GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets [C]// Proceedings of the 27th International Conference on Neural Information Processing Systems . Cambridge: MIT Press, 2014: 2672–2680.
|
2 |
XU T, ZHANG P, HUANG Q, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 1316–1324.
|
3 |
韩爽. 基于生成对抗网络的文本到图像生成技术研究[D]. 大庆: 东北石油大学, 2022. HAN Shuang. Research on text-to-image generation techniques based on generative adversarial networks [D]. Daqing: Northeast Petroleum University, 2022.
|
4 |
QIAO T, ZHANG J, XU D, et al. Learn, imagine and create: text-to-image generation from prior knowledge [C]// Proceeding of the 33rd Conference on Neural Information Processing Systems . Vancouver: [s. n.], 2019: 887–897.
|
5 |
LIANG J, PEI W, LU F. CPGAN: content-parsing generative adversarial networks for text-to-image synthesis [C]// Proceeding of the 16th European Conference on Computer Vision . [S. l.]: Springer, 2020: 491–508.
|
6 |
ZHANG H, XU T, LI H, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks [C]// 2017 IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 5908–5916.
|
7 |
ZHANG H, XU T, LI H, et al StackGAN++: realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41 (8): 1947- 1962
|
8 |
QIAO T, ZHANG J, XU D, et al. MirrorGAN: learning text-to-image generation by redescription [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 1505–1514.
|
9 |
TAO M, TANG H, WU F, et al. Df-GAN: a simple and effective baseline for text-to-image synthesis [C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, 2022: 16515–16525.
|
10 |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale [EB/OL]. (2021-06-03)[2023-09-17]. https://arxiv.org/pdf/2010.11929.pdf.
|
11 |
REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis [C]// Proceedings of the 33rd International Conference on Machine Learning . New York: ACM, 2016: 1060–1069.
|
12 |
ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]// 2017 IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 2223–2232.
|
13 |
贺小峰, 毛琳, 杨大伟 文本生成图像中语义-空间特征增强算法[J]. 大连民族大学学报, 2022, 24 (5): 401- 406 HE Xiaofeng, MAO Lin, YANG Dawei Semantic-spatial feature enhancement algorithm for text-to-image generation[J]. Journal of Dalian Minzu University, 2022, 24 (5): 401- 406
|
14 |
薛志杭, 许喆铭, 郎丛妍, 等 基于图像-文本语义一致性的文本生成图像方法[J]. 计算机研究与发展, 2023, 60 (9): 2180- 2190 XUE Zhihang, XU Zheming, LANG Congyan, et al Text-to-image generation method based on image-text semantic consistency[J]. Journal of Computer Research and Development, 2023, 60 (9): 2180- 2190
|
15 |
吕文涵, 车进, 赵泽纬, 等. 基于动态卷积与文本数据增强的图像生成方法[EB/OL]. (2023-04-28)[2023-09-17]. https://doi.org/10.19678/j.issn.1000-3428.0066470.
|
16 |
SHEYNIN S, ASHUAL O, POLYAK A, et al. KNN-diffusion: image generation via large-scale retrieval [EB/OL]. (2022-10-02)[2023-09-17]. https://arxiv.org/pdf/2204.02849.pdf.
|
17 |
NICHOL A Q, DHARIWAL P, RAMESH A, et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models [C]// International Conference on Machine Learning . Long Beach: IEEE, 2022: 16784–16804.
|
18 |
田枫, 孙小强, 刘芳, 等 融合双注意力与多标签的图像中文描述生成方法[J]. 计算机系统应用, 2021, 30 (7): 32- 40 TIAN Feng, SUN Xiaoqiang, LIU Fang, et al Chinese image caption with dual attention and multi-label image[J]. Computer Systems and Applications, 2021, 30 (7): 32- 40
|
19 |
HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging [EB/OL]. (2015-08-09)[2023-09-17]. https://arxiv.org/pdf/1508.01991.pdf.
|
20 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceeding of the 31st International Conference on Neural Information Processing Systems . Long Beach: [s.n.], 2017: 6000–6010.
|
21 |
MIRZA M, OSINDERO S. Conditional generative adversarial nets [EB/OL]. (2014-11-06)[2023-09-17]. https://arxiv.org/pdf/1411.1784.pdf.
|
22 |
WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 dataset [EB/OL]. (2022-08-12)[2023-09-17]. https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf.
|
23 |
LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context [C]// European Conference on Computer Vision . [S. l.]: Springer, 2014: 740–755.
|
24 |
SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs [J]. Proceedings of the 30th International Conference on Neural Information Processing Systems . Barcelona: [s. n.], 2016: 2234–2242.
|
25 |
HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach: [s.n.], 2017: 6629–6640.
|
26 |
王家喻. 基于生成对抗网络的图像生成研究[D]. 合肥: 中国科学技术大学, 2021. WANG Jiayu. Image generation based on generative adversarial networks [D]. Hefei: University of Science and Technology of China, 2021.
|
27 |
王蕾. 基于关联语义挖掘的文本生成图像算法研究[D]. 西安: 西安电子科技大学, 2020. WANG Lei. Text-to-image synthesis based on semantic correlation mining [D]. Xi’an: Xidian University, 2020.
|
28 |
STAP D, BLEEKER M, IBRAHIMI S, et al. Conditional image generation and manipulation for user-specified content [EB/OL]. (2020-05-11)[2023-09-17]. https://arxiv.org/pdf/2005.04909.pdf.
|
29 |
ZHU M, PAN P, CHEN W, et al. DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 5802–5810.
|
30 |
YIN G, LIU B, SHENG L, et al. Semantics disentangling for text-to-image generation [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 2327–2336.
|
31 |
LIAO W, HU K, YANG M Y, et al. Text to image generation with semantic-spatial aware GAN [C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 18187–18196.
|
32 |
YE S, WANG H, TAN M, et al Recurrent affine transformation for text-to-image synthesis[J]. IEEE Transactions on Multimedia, 2023, 26: 462- 473
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|