基于对比学习的声源定位引导视听分割模型
黄文湖,赵邢,谢亮,梁浩然,梁荣华

Contrastive learning-based sound source localization-guided audio-visual segmentation model
Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG
表 4 MS3子集上不同初始化策略的性能比较
Tab.4 Performance comparison of different initialization strategies on MS3 sub-dataset
帧尺寸方法从头训练S4预训练
ResNet-50PVT v2ResNet-50PVT v2
$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%
224×224AVSBench47.8857.854.0064.554.3357.34
ECMVAE48.6960.757.8470.857.5667.460.8172.9
AuTR49.4161.256.2167.256.0066.060.9572.5
AVS-UFE55.8864.561.9570.959.3264.47
AVSegFormer49.5362.858.3669.353.7364.360.9272.0
SSL2AVS52.4963.362.1572.359.8469.464.1674.7
SSL2AVS+56.1866.962.4972.5
512×512AVSegFormer53.8165.661.3373.054.4963.761.3773.1
SSL2AVS59.5069.865.1675.664.2774.368.4477.8