基于多模态语义信息的文本生成图像方法
|
|
杨冰,周家辉,姚金良,向学勤
|
Text-to-image generation method based on multimodal semantic information
|
|
Bing YANG,Jiahui ZHOU,Jinliang YAO,Xueqin XIANG
|
|
| 表 1 不同模型在2个数据集上的评估指标对比 |
| Tab.1 Comparison of evaluation metrics for different models on two datasets |
|
| 模型 | CUB数据集 | | COCO数据集 | | FID↓ | IS↑ | SCLIP↑ | | FID↓ | IS↑ | SCLIP↑ | | VQ-Diffusion[23] | 10.32 | — | 0.322 4 | | 13.86 | — | 0.338 2 | | DFGAN | 14.81 | 5.10 | 0.292 0 | | 19.32 | 35.16 | 0.297 2 | | RATGAN | 13.91 | 5.36 | — | | 14.60 | 36.42 | — | | DMF-GAN[24] | 13.21 | 5.42 | — | | 15.83 | 36.72 | — | | SAW-GAN[25] | 10.45 | 4.63 | — | | 11.17 | 35.17 | — | | GALIP | 10.08 | 5.92 | 0.316 4 | | 5.85 | 37.11 | 0.333 8 | | 本研究 | 9.56 | 6.04 | 0.325 2 | | 5.62 | 37.36 | 0.340 5 |
|
|
|