基于对比学习的声源定位引导视听分割模型
黄文湖,赵邢,谢亮,梁浩然,梁荣华

Contrastive learning-based sound source localization-guided audio-visual segmentation model
Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG
表 1 SSL2AVS与基线模型的性能比较
Tab.1 Performance comparison between SSL2AVS and baseline model
方法图像编码器FPS/(帧·s−1)S4MS3
$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%
AVSegFormer-R50ResNet-50114.9785.976.4562.849.53
SSL2AVS-R50ResNet-5096.3986.576.8763.352.49
AVSegFormer-R50+ResNet-5042.5386.476.1158.043.41
SSL2AVS-R50+ResNet-5036.3386.877.1666.956.18
AVSegFormer-R50*ResNet-5030.9686.776.3865.653.81
SSL2AVS-R50*ResNet-5026.1188.078.6869.859.50
AVSegFormer-PVTPVT v281.8389.982.0669.358.36
SSL2AVS-PVTPVT v280.0690.382.4272.362.15
AVSegFormer-PVT*PVT v222.7990.583.0673.061.33
SSL2AVS-PVT*PVT v221.2091.684.4375.665.16