基于跨模态级联扩散模型的图像描述方法
陈巧红,郭孟浩,方贤,孙麒

Image captioning based on cross-modal cascaded diffusion model
Qiaohong CHEN,Menghao GUO,Xian FANG,Qi SUN
表 2 跨模态语义对齐模块在2个数据集上的消融实验
Tab.2 Ablation experiment of cross-modal semantic alignment module in two datasets
MELinearMicrosoft COCOFlickr30k
B@1B@4MRCB@1B@4MRC
××78.934.827.556.8116.170.128.522.851.262.3
×80.537.728.257.3128.572.630.123.451.963.8
81.239.929.058.9133.874.531.223.953.265.4