基于对比学习的声源定位引导视听分割模型

基于对比学习的声源定位引导视听分割模型

黄文湖,赵邢,谢亮,梁浩然,梁荣华

Contrastive learning-based sound source localization-guided audio-visual segmentation model

Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG

表 7 特征增强模块对模型性能的影响

Tab.7 Impact of feature enhancement module on model performance

特征增强	图像编码器	$N_{\mathrm{p}} / 10^6 $	S4			MS3
特征增强	图像编码器	$N_{\mathrm{p}} / 10^6 $	$ {M_{\mathrm{J}}} $/%	$ {M_{\mathrm{F}}} $/%		$ {M_{\mathrm{J}}} $/%	$ {M_{\mathrm{F}}} $/%
无	ResNet-50	120.75	77.59	87.5		53.37	64.0
无	PVT v2	177.98	84.24	91.4		59.96	71.6
有	ResNet-50	136.45	78.68	88.0		59.50	69.8
有	PVT v2	179.72	84.43	91.6		65.16	75.6