基于跨模态级联扩散模型的图像描述方法
陈巧红,郭孟浩,方贤,孙麒
Image captioning based on cross-modal cascaded diffusion model
Qiaohong CHEN,Menghao GUO,Xian FANG,Qi SUN
表 1
模型在2个数据集上的模块消融实验
Tab.1
Module ablation experiment of model in two datasets
SAM
CDM
Microsoft COCO
Flickr30k
B@1
B@4
M
R
C
B@1
B@4
M
R
C
×
×
77.6
34.5
27.5
56.5
115.2
68.5
27.5
22.2
50.1
59.1
×
√
78.9
34.8
27.5
56.8
116.1
69.7
28.3
22.5
50.9
62.0
√
×
80.0
38.2
28.3
57.8
128.7
72.2
29.8
23.4
51.8
63.5
√
√
81.2
39.9
29.0
58.9
133.8
74.5
31.2
23.9
53.2
65.4