基于跨模态级联扩散模型的图像描述方法
陈巧红,郭孟浩,方贤,孙麒

Image captioning based on cross-modal cascaded diffusion model
Qiaohong CHEN,Menghao GUO,Xian FANG,Qi SUN
表 6 Flickr30k 数据集中不同图像描述模型的性能对比
Tab.6 Performance comparison of different image description models in Flickr30k dataset
模型B@1B@4MC
Deep VS[26]57.315.715.324.7
Soft-Attention[2]66.719.118.5
Hard-Attention[2]66.919.918.5
Adaptive[42]67.725.120.453.1
NBT[34]69.027.121.757.5
Relation-Context[3]73.630.123.860.2
LSTNet[43]67.123.320.464.5
本研究74.531.223.965.4