基于多视图跨模态特征融合的图像描述生成
张乃洲,赵云超,曹薇,张啸剑

Image captioning generation based on multiple-view cross-modal feature fusion
Naizhou ZHANG,Yunchao ZHAO,Wei CAO,Xiaojian ZHANG
表 3 在Flickr30k数据集上与其他先进模型的性能比较
Tab.3 Comparison with other state-of-the-art model on Flickr30k dataset
%
模型BLEU-1BLEU-4METEORROUGE-LCIDEr
Soft-Attention[4]66.719.118.5
Hard-Attention[4]66.919.918.5
Adaptive-Attention[5]67.725.120.453.1
A_R_L[35]69.827.721.548.557.4
IVAIC[36]70.830.622.549.863.0
VRCDA[33]73.230.622.750.666.0
MVCMFAF (本文)75.233.734.252.175.6