基于对比学习的声源定位引导视听分割模型
黄文湖,赵邢,谢亮,梁浩然,梁荣华

Contrastive learning-based sound source localization-guided audio-visual segmentation model
Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG
表 2 SSL2AVS与现有视听分割方法的性能比较
Tab.2 Performance comparison of SSL2AVS and existing AVS methods
方法图像编码器S4MS3
$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%
AVSBench[9]ResNet-5084.872.8057.847.90
PVT v287.978.7064.554.00
ECMVAE[23]ResNet-5086.576.3360.748.69
PVT v290.181.7470.857.84
CATR[19]ResNet-5086.674.8065.352.80
PVT v289.681.4070.059.00
AVSC[36]ResNet-5085.277.0261.549.58
PVT v288.280.5765.158.22
AVS-UFE[32]ResNet-5087.578.9664.555.88
PVT v290.483.1570.961.95
COMBO[10]ResNet-5090.181.7066.654.50
PVT v291.984.7071.259.20
AVSegFormer[20]ResNet-5085.976.4562.849.53
PVT v289.982.0669.358.36
SSL2AVSResNet-5086.877.1666.956.18
PVT v290.382.4272.362.15