基于对比学习的声源定位引导视听分割模型
黄文湖,赵邢,谢亮,梁浩然,梁荣华

Contrastive learning-based sound source localization-guided audio-visual segmentation model
Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG
表 6 SSL2AVS与基线模型使用少量训练数据的性能比较
Tab.6 Performance comparison of SSL2AVS and baseline model using a small amount of training data
方法图像
编码器
$N_{\mathrm{p}} / 10^6 $MS3S4
10%30%10%30%
$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%
AVSegFormerResNet-50126.7444.5856.947.4060.770.1384.374.8185.7
PVT v2183.9552.7564.855.1667.979.6888.881.5589.7
SSL2AVSResNet-50136.4553.9066.057.7669.271.8084.376.6787.1
PVT v2179.7259.5769.262.5872.680.9489.882.7590.8