基于对比学习的声源定位引导视听分割模型

基于对比学习的声源定位引导视听分割模型

黄文湖,赵邢,谢亮,梁浩然,梁荣华

Contrastive learning-based sound source localization-guided audio-visual segmentation model

Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG

表 6 SSL2AVS与基线模型使用少量训练数据的性能比较

Tab.6 Performance comparison of SSL2AVS and baseline model using a small amount of training data

方法	图像编码器	$N_{\mathrm{p}} / 10^6 $	MS3						S4
			10%			30%			10%			30%
			$ {M_{\text{J}}} $/%	$ {M_{\text{F}}} $/%		$ {M_{\text{J}}} $/%	$ {M_{\text{F}}} $/%		$ {M_{\text{J}}} $/%	$ {M_{\text{F}}} $/%		$ {M_{\text{J}}} $/%	$ {M_{\text{F}}} $/%
AVSegFormer	ResNet-50	126.74	44.58	56.9		47.40	60.7		70.13	84.3		74.81	85.7
AVSegFormer	PVT v2	183.95	52.75	64.8		55.16	67.9		79.68	88.8		81.55	89.7
SSL2AVS	ResNet-50	136.45	53.90	66.0		57.76	69.2		71.80	84.3		76.67	87.1
SSL2AVS	PVT v2	179.72	59.57	69.2		62.58	72.6		80.94	89.8		82.75	90.8