基于对比学习的声源定位引导视听分割模型
黄文湖,赵邢,谢亮,梁浩然,梁荣华

Contrastive learning-based sound source localization-guided audio-visual segmentation model
Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG
表 9 双向注意力融合模块对模型性能的影响
Tab.9 Impact of bidirectional attention fusion module on model performance
融合方式S4MS3
$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%$ {M_{\text{J}}} $/%$ {M_{\text{F}}} $/%
不进行任何融合77.2986.955.9866.9
仅视觉融合77.6687.358.8469.0
仅音频融合77.8487.257.5768.6
双向融合78.6888.059.5069.8