基于全局−局部特征和自适应注意力机制的图像语义描述算法
|
赵小虎,尹良飞,赵成龙
|
Image captioning based on global-local feature and adaptive-attention
|
Xiao-hu ZHAO,Liang-fei YIN,Cheng-long ZHAO
|
|
表 2 Microsoft COCO数据集实验性能对比 |
Tab.2 Comparison of experimental results on Microsoft COCO caption dateset % |
|
方法 | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | METEOR | ROUGE-L | CIDEr | NIC | 66.6 | 46.1 | 32.9 | 24.6 | − | − | − | MS Captivator | 71.5 | 54.3 | 40.7 | 30.8 | 24.8 | 52.6 | 93.1 | m-RNN | 67 | 45 | 35 | 25 | − | − | − | LRCN | 62.79 | 44.19 | 30.41 | 21 | − | − | − | MSR | 73.0 | 56.5 | 42.9 | 32.5 | 25.1 | − | 98.6 | ATT-EK | 74.0 | 56.0 | 42.0 | 31.0 | 26.0 | − | − | Soft-attention | 70.7 | 49.2 | 34.4 | 24.3 | 23.9 | − | − | Hard-attention | 71.8 | 50.4 | 35.7 | 25.0 | 23.0 | − | − | ATT-FCN | 70.9 | 53.7 | 40.2 | 30.4 | 24.3 | − | − | Aligning-ATT | 69.7 | 51.9 | 38.1 | 28.20 | 23.5 | 50.9 | 83.8 | ERD | − | − | − | 29.0 | 23.7 | − | 88.6 | Areas-ATT | − | − | − | 30.7 | 24.5 | − | 93.8 | 本文方法 | 74.0 | 60.1 | 43.9 | 35.2 | 27.5 | 52.4 | 98.7 |
|
|
|