基于对比学习的声源定位引导视听分割模型
|
黄文湖,赵邢,谢亮,梁浩然,梁荣华
|
Contrastive learning-based sound source localization-guided audio-visual segmentation model
|
Wenhu HUANG,Xing ZHAO,Liang XIE,Haoran LIANG,Ronghua LIANG
|
|
表 6 SSL2AVS与基线模型使用少量训练数据的性能比较 |
Tab.6 Performance comparison of SSL2AVS and baseline model using a small amount of training data |
|
方法 | 图像 编码器 | $N_{\mathrm{p}} / 10^6 $ | MS3 | | S4 | 10% | | 30% | | 10% | | 30% | $ {M_{\text{J}}} $/% | $ {M_{\text{F}}} $/% | | $ {M_{\text{J}}} $/% | $ {M_{\text{F}}} $/% | | $ {M_{\text{J}}} $/% | $ {M_{\text{F}}} $/% | | $ {M_{\text{J}}} $/% | $ {M_{\text{F}}} $/% | AVSegFormer | ResNet-50 | 126.74 | 44.58 | 56.9 | | 47.40 | 60.7 | | 70.13 | 84.3 | | 74.81 | 85.7 | PVT v2 | 183.95 | 52.75 | 64.8 | | 55.16 | 67.9 | | 79.68 | 88.8 | | 81.55 | 89.7 | SSL2AVS | ResNet-50 | 136.45 | 53.90 | 66.0 | | 57.76 | 69.2 | | 71.80 | 84.3 | | 76.67 | 87.1 | PVT v2 | 179.72 | 59.57 | 69.2 | | 62.58 | 72.6 | | 80.94 | 89.8 | | 82.75 | 90.8 |
|
|
|