基于多模态语义信息的文本生成图像方法

基于多模态语义信息的文本生成图像方法

杨冰,周家辉,姚金良,向学勤

Text-to-image generation method based on multimodal semantic information

Bing YANG,Jiahui ZHOU,Jinliang YAO,Xueqin XIANG

表 1 不同模型在2个数据集上的评估指标对比

Tab.1 Comparison of evaluation metrics for different models on two datasets

模型	CUB数据集				COCO数据集
模型	FID↓	IS↑	S_CLIP↑		FID↓	IS↑	S_CLIP↑
VQ-Diffusion^[23]	10.32	—	0.322 4		13.86	—	0.338 2
DFGAN	14.81	5.10	0.292 0		19.32	35.16	0.297 2
RATGAN	13.91	5.36	—		14.60	36.42	—
DMF-GAN^[24]	13.21	5.42	—		15.83	36.72	—
SAW-GAN^[25]	10.45	4.63	—		11.17	35.17	—
GALIP	10.08	5.92	0.316 4		5.85	37.11	0.333 8
本研究	9.56	6.04	0.325 2		5.62	37.36	0.340 5