基于多视图跨模态特征融合的图像描述生成
张乃洲,赵云超,曹薇,张啸剑

Image captioning generation based on multiple-view cross-modal feature fusion
Naizhou ZHANG,Yunchao ZHAO,Wei CAO,Xiaojian ZHANG
表 2 在MSCOCO测试数据集上与其他先进模型在集成模型上的性能比较
Tab.2 Comparison with other state-of-the-art model on MSCOCO test dataset in ensemble-model setting %
模型BLEU-1BLEU-4METEORROUGE-LCIDErSPICE
SCST[6]35.427.156.6117.5
AoANet[9]81.640.229.359.4132.022.8
X-Transformer[10]81.740.729.959.7135.323.8
M2Transformer[11]82.040.529.759.5134.523.5
GET[12]82.140.629.859.6135.123.8
DLCT[14]82.240.829.959.8137.523.3
PureT[21]83.442.130.460.8141.024.3
MVCMFAF (本文)83.542.730.661.1142.324.5