计算机技术、控制工程、通信技术 |
|
|
|
|
基于多头自注意力机制与MLP-Interactor的多模态情感分析 |
林宜山1( ),左景1,卢树华1,2,*( ) |
1. 中国人民公安大学 信息网络安全学院,北京 102600 2. 公安部安全防范技术与风险评估重点实验室,北京 102600 |
|
Multimodal sentiment analysis based on multi-head self-attention mechanism and MLP-Interactor |
Yishan LIN1( ),Jing ZUO1,Shuhua LU1,2,*( ) |
1. College of Information and Cyber Security, People’s Public Security University of China, Beijing 102600, China 2. Key Laboratory of Security Technology and Risk Assessment, Ministry of Public Security, Beijing 102600, China |
1 |
ZHU L, ZHU Z, ZHANG C, et al Multimodal sentiment analysis based on fusion methods: a survey[J]. Information Fusion, 2023, 95: 306- 325
doi: 10.1016/j.inffus.2023.02.028
|
2 |
GANDHI A, ADHVARYU K, PORIA S, et al Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Information Fusion, 2023, 91: 424- 444
doi: 10.1016/j.inffus.2022.09.025
|
3 |
CAO R, YE C, ZHOU H. Multimodal sentiment analysis with self-attention [C]// Proceedings of the Future Technologies Conference. [S. l. ]: Springer, 2021: 16-26.
|
4 |
BALTRUSAITIS T, AHUJA C, MORENCY L P Multimodal machine learning: a survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41 (2): 423- 443
|
5 |
GUO W, WANG J, WANG S Deep multimodal representation learning: a survey[J]. IEEE Access, 2019, 7: 63373- 63394
doi: 10.1109/ACCESS.2019.2916887
|
6 |
HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis [EB/OL]. (2021-09-16)[2025-05-28]. https://arxiv.org/pdf/2109.00412.
|
7 |
HAZARIKA D, ZIMMERMANN R, PORIA S. Misa: modality-invariant and specific representations for multimodal sentiment analysis [C]// Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020: 1122-1131.
|
8 |
TANG J, LIU D, JIN X, et al Bafn: bi-direction attention based fusion network for multimodal sentiment analysis[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 33 (4): 1966- 1978
|
9 |
WU Y, LIN Z, ZHAO Y, et al. A text-centered shared-private framework via cross-modal prediction for multimodal sentiment analysis [C]// Findings of the Association for Computational Linguistics. [S. l.]: ACL, 2021: 4730-4738.
|
10 |
KIM K, PARK S AOBERT: all-modalities-in-one BERT for multimodal sentiment analysis[J]. Information Fusion, 2023, 92: 37- 45
doi: 10.1016/j.inffus.2022.11.022
|
11 |
HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis [C]// Proceedings of the 2021 International Conference on Multimodal Interaction. Montréal: ACM, 2021: 6-15.
|
12 |
LI Z, GUO Q, PAN Y, et al Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis[J]. Information Fusion, 2023, 99: 101891
doi: 10.1016/j.inffus.2023.101891
|
13 |
MORENCY L P, MIHALCEA R, DOSHI P. Towards multimodal sentiment analysis: harvesting opinions from the web [C]// Proceedings of the 13th International Conference on Multimodal Interfaces. Alicante: ACM, 2011: 169-176.
|
14 |
ZADEH A, LIANG P P, PORIA S, et al. Multi-attention recurrent network for human communication comprehension [C]// Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018: 5642-5649.
|
15 |
PORIA S, CHATURVEDI I, CAMBRIA E, et al. Convolutional MKL based multimodal emotion recognition and sentiment analysis [C]//IEEE 16th International Conference on Data Mining. Barcelona: IEEE, 2016: 439-448.
|
16 |
ALAM F, RICCARDI G. Predicting personality traits using multimodal information [C]// Proceedings of the 2014 ACM Multimedia on Workshop on Computational Personality Recognition. Orlando: ACM, 2014: 15-18.
|
17 |
CAI G, XIA B. Convolutional neural networks for multimedia sentiment analysis [C]// Natural Language Processing and Chinese Computing: 4th CCF Conference. Nanchang: Springer, 2015: 159-167.
|
18 |
GKOUMAS D, LI Q, LIOMA C, et al What makes the difference? an empirical comparison of fusion strategies for multimodal language analysis[J]. Information Fusion, 2021, 66: 184- 197
doi: 10.1016/j.inffus.2020.09.005
|
19 |
LIN T, WANG Y, LIU X, et al A survey of transformers[J]. AI Open, 2022, 3: 111- 132
doi: 10.1016/j.aiopen.2022.10.001
|
20 |
TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences [C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019: 6558-6569.
|
21 |
CHEN C, HONG H, GUO J, et al Inter-intra modal representation augmentation with trimodal collaborative disentanglement network for multimodal sentiment analysis[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1476- 1488
doi: 10.1109/TASLP.2023.3263801
|
22 |
YUAN Z, LI W, XU H, et al. Transformer-based feature reconstruction network for robust multimodal sentiment analysis [C]// Proceedings of the 29th ACM International Conference on Multimedia. Chengdu: ACM, 2021: 4400-4407.
|
23 |
MA L, YAO Y, LIANG T, et al. Multi-scale cooperative multimodal transformers for multimodal sentiment analysis in videos [EB/OL]. (2022-06-17)[2025-05-28]. https://arxiv.org/pdf/2206.07981.
|
24 |
YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [C]// Proceedings of the AAAI Conference on Artificial Intelligence. [S. l. ]: AAAI Press, 2021: 10790-10797.
|
25 |
WANG D, GUO X, TIAN Y, et al TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136: 109259
doi: 10.1016/j.patcog.2022.109259
|
26 |
MELAS-KYRIAZI L. Do you even need attention? a stack of feed-forward layers does surprisingly well on imagenet [EB/OL]. (2021-05-06)[2025-05-28]. https://arxiv.org/pdf/2105.02723.
|
27 |
TOLSTIKHIN I O, HOULSBY N, KOLESNIKOV A, et al Mlp-mixer: an all-mlp architecture for vision[J]. Advances in Neural Information Processing Systems, 2021, 34: 24261- 24272
|
28 |
TOUVRON H, BOJANOWSKI P, CARON M, et al Resmlp: feedforward networks for image classification with data-efficient training[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45 (4): 5314- 5321
|
29 |
NIE Y, LI L, GAN Z, et al. Mlp architectures for vision-and-language modeling: an empirical study [EB/OL]. (2021-12-08)[2025-05-28]. https://arxiv.org/pdf/2112.04453.
|
30 |
LIN H, ZHANG P, LING J, et al PS-mixer: a polar-vector and strength-vector mixer model for multimodal sentiment analysis[J]. Information Processing and Management, 2023, 60 (2): 103229
doi: 10.1016/j.ipm.2022.103229
|
31 |
SUN H, WANG H, LIU J, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation [C]// Proceedings of the 30th ACM International Conference on Multimedia. Lisboa: ACM, 2022: 3722-3729.
|
32 |
BAIRAVEL S, KRISHNAMURTHY M Novel OGBEE-based feature selection and feature-level fusion with MLP neural network for social media multimodal sentiment analysis[J]. Soft Computing, 2020, 24 (24): 18431- 18445
doi: 10.1007/s00500-020-05049-6
|
33 |
KE P, JI H, LIU S, et al. SentiLARE: sentiment-aware language representation learning with linguistic knowledge [EB/OL]. (2020-09-24)[2025-05-28]. https://arxiv.org/pdf/1911.02493.
|
34 |
LIU Y, OTT M, GOYAL N, et al. Roberta: a robustly optimized bert pretraining approach [EB/OL]. (2019-07-26)[2025-05-28]. https://arxiv.org/pdf/1907.11692.
|
35 |
DEGOTTEX G, KANE J, DRUGMAN T, et al. COVAREP: a collaborative voice analysis repository for speech technologies [C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Florence: IEEE, 2014: 960-964.
|
36 |
ZADEH A, ZELLERS R, PINCUS E, et al. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos [EB/OL]. (2016-08-11)[2025-05-28]. https://arxiv.org/pdf/1606.06259.
|
37 |
ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph [C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018: 2236-2246.
|
38 |
CHEONG J H, JOLLY E, XIE T, et al Py-feat: Python facial expression analysis toolbox[J]. Affective Science, 2023, 4 (4): 781- 796
doi: 10.1007/s42761-023-00191-4
|
39 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Advances in Neural Information Processing Systems. California: Curran Associates Inc , 2017: 5998-6008.
|
40 |
ZZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis [EB/OL]. (2017-07-23)[2025-05-28]. https://arxiv.org/pdf/1707.07250.
|
41 |
LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors [EB/OL]. (2018-05-31)[2025-05-28]. https://arxiv.org/pdf/1806.00064.
|
42 |
ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning [C]// Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018: 5634-5641.
|
43 |
RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained Transformers [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. [S. l.]: ACL, 2020: 2359-2369.
|
44 |
YANG B, SHAO B, WU L, et al Multimodal sentiment analysis with unidirectional modality translation[J]. Neurocomputing, 2022, 467: 130- 137
doi: 10.1016/j.neucom.2021.09.041
|
45 |
LEI Y, YANG D, LI M, et al. Text-oriented modality reinforcement network for multimodal sentiment analysis from unaligned multimodal sequences [C]// CAAI International Conference on Artificial Intelligence. Singapore: Springer, 2023: 189-200.
|
46 |
WANG Y, HE J, WANG D, et al Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis[J]. Neurocomputing, 2024, 572: 127181
doi: 10.1016/j.neucom.2023.127181
|
47 |
LIU W, CAO S, ZHANG S Multimodal consistency-specificity fusion based on information bottleneck for sentiment analysis[J]. Journal of King Saud University-Computer and Information Sciences, 2024, 36 (2): 101943
doi: 10.1016/j.jksuci.2024.101943
|
48 |
ZENG Y, LI Z, CHEN Z, et al A feature-based restoration dynamic interaction network for multimodal sentiment analysis[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107335
doi: 10.1016/j.engappai.2023.107335
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|