基于对比学习的声源定位引导视听分割模型
|
黄文湖,赵邢,谢亮,梁浩然,梁荣华
|
Contrastive learning-based sound source localization-guided audio-visual segmentation model
|
Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG
|
|
表 2 SSL2AVS与现有视听分割方法的性能比较 |
Tab.2 Performance comparison of SSL2AVS and existing AVS methods |
|
方法 | 图像编码器 | S4 | | MS3 | $ {M_{\text{F}}} $/% | $ {M_{\text{J}}} $/% | | $ {M_{\text{F}}} $/% | $ {M_{\text{J}}} $/% | AVSBench[9] | ResNet-50 | 84.8 | 72.80 | | 57.8 | 47.90 | PVT v2 | 87.9 | 78.70 | | 64.5 | 54.00 | ECMVAE[23] | ResNet-50 | 86.5 | 76.33 | | 60.7 | 48.69 | PVT v2 | 90.1 | 81.74 | | 70.8 | 57.84 | CATR[19] | ResNet-50 | 86.6 | 74.80 | | 65.3 | 52.80 | PVT v2 | 89.6 | 81.40 | | 70.0 | 59.00 | AVSC[36] | ResNet-50 | 85.2 | 77.02 | | 61.5 | 49.58 | PVT v2 | 88.2 | 80.57 | | 65.1 | 58.22 | AVS-UFE[32] | ResNet-50 | 87.5 | 78.96 | | 64.5 | 55.88 | PVT v2 | 90.4 | 83.15 | | 70.9 | 61.95 | COMBO[10] | ResNet-50 | 90.1 | 81.70 | | 66.6 | 54.50 | PVT v2 | 91.9 | 84.70 | | 71.2 | 59.20 | AVSegFormer[20] | ResNet-50 | 85.9 | 76.45 | | 62.8 | 49.53 | PVT v2 | 89.9 | 82.06 | | 69.3 | 58.36 | SSL2AVS | ResNet-50 | 86.8 | 77.16 | | 66.9 | 56.18 | PVT v2 | 90.3 | 82.42 | | 72.3 | 62.15 |
|
|
|