Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (6): 1110-1118    DOI: 10.3785/j.issn.1008-973X.2025.06.002
计算机技术     
基于注意力机制的视觉诱导晕动症评估模型
蔡永青(),韩成*(),权巍,陈兀迪
长春理工大学 计算机科学技术学院,吉林 长春 130022
Visual induced motion sickness estimation model based on attention mechanism
Yongqing CAI(),Cheng HAN*(),Wei QUAN,Wudi CHEN
School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China
 全文: PDF(1295 KB)   HTML
摘要:

为了准确评估用户在体验虚拟产品时由视觉内容诱发的晕动症程度,提出基于注意力机制的视觉诱导晕动症(VIMS)评估模型. 该模型依托Transformer架构构建网络,分别在时间序列和空间序列上建立自注意力机制,捕捉时间与空间特征之间的关系. 利用光流信息和用户关注信息,设计运动流和关注流2个子网络,构成双流网络结构;运动流子网络解析视觉内容中的运动特征,关注流子网络专注于提取用户关注区域的物体、纹理等重要信息. 采用后端融合策略实现双流网络结果的融合. 在公开视频数据集上进行实验验证,结果表明,关注流子网络和Transformer架构在注意力机制方面的协同作用有效提升了模型准确性. VIMS模型在F1指数、准确度和精确率方面均取得了最优结果,分别为0.8468、89.19%和92.28%,相较于现有方法有显著的性能提升.

关键词: 虚拟现实视觉诱导晕动症注意力机制深度学习Transformer架构    
Abstract:

A visual induced motion sickness (VIMS) estimation model based on attention mechanism was proposed to accurately assess the degree of VIMS experienced by users when interacting with virtual products. The model was constructed upon Transformer architecture, incorporating the self-attention mechanism within temporal and spatial sequences to capture the complex interactions between temporal and spatial features. By utilizing the optical flow information and user attention information, two sub-networks of motion flow and attention flow were designed to form a dual-flow network structure. The motion flow sub-network was responsible for capturing the motion features in the visual content, and the attention flow sub-network focused on extracting critical information, such as objects, textures, and other key elements within the user’s attention area. A late fusion strategy was employed to effectively combine the outputs of the dual-flow network. Experimental validation conducted on public video datasets demonstrated that the synergistic interaction between the attention flow sub-network and the Transformer architecture significantly enhanced the model accuracy. The VIMS model achieved optimal results in terms of the F1 score, accuracy and precision with values of 0.8468, 89.19% and 92.28%, respectively, representing a notable advancement over existing approaches.

Key words: virtual reality    visual induced motion sickness    attention mechanism    deep learning    Transformer architecture
收稿日期: 2024-11-25 出版日期: 2025-05-30
CLC:  TP 391  
基金资助: 吉林省教育厅科学研究项目(JJKH20250531BS).
通讯作者: 韩成     E-mail: 1364392394@qq.com;hancheng@cust.edu.cn
作者简介: 蔡永青(1999—),男,博士生,从事虚拟现实技术研究. orcid.org/0000-0003-0005-545. E-mail:1364392394@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
蔡永青
韩成
权巍
陈兀迪

引用本文:

蔡永青,韩成,权巍,陈兀迪. 基于注意力机制的视觉诱导晕动症评估模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1110-1118.

Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.06.002        https://www.zjujournals.com/eng/CN/Y2025/V59/I6/1110

图 1  基于注意力机制的视觉诱导晕动症评估模型整体框架
图 2  提取关注区域
图 3  基于Transformer架构的子网络流程图
图 4  混合数据集的组成及训练集/测试集示例
配置参数信息
CPUIntel CORE i9 13900K
GPUNVIDIA GeForce RTX 2080TI
操作系统Ubuntu
通用并行计算架构CUDA 10.0、cuDNN 7.6.1
深度学习框架Pytorch 1.2
开发环境Anaconda 3、Python 3.7
表 1  实验环境配置信息
方法输入特征模型F1A/%P/%
Lee等[24]运动流+视差流+显著流3D CNN0.649 474.3681.87
Du等[25]运动流+视差流+显著流3D CNN+attention0.689 878.3883.34
权巍等[26]运动流+外观流3D CNN+attention0.816 786.4988.91
本研究运动流+关注流Transformer0.846 889.1992.28
表 2  不同VIMS评估方法的实验结果对比
方法F1A/%P/%
外观流网络-ResNet架构0.516 654.0570.79
关注流网络-ResNet架构0.619 862.1662.10
运动流网络-ResNet架构0.665 272.9786.49
外观流网络-Transformer架构0.748 878.3872.36
关注流网络- Transformer架构0.792 583.7889.10
运动流网络- Transformer架构0.824 086.4890.99
表 3  不同子网络在ResNet/Transformer架构中的实验结果对比
图 5  不同子网络模型的准确度和损失变化曲线
视野范围F1A/%P/%
完整视野(360°×180°)0.748 878.3872.36
关注视野(180°×100°)0.792 583.7889.10
HMD视野(100°×90°)0.732 175.6771.31
表 4  不同视野范围的实验结果对比
方法F1A/%P/%
外观流+关注流-ResNet架构0.619 862.1662.10
外观流+运动流-ResNet架构0.742 981.0884.83
关注流+运动流-ResNet架构0.766 581.0887.16
外观流+关注流-Transformer架构0.805 083.7877.61
外观流+运动流-Transformer架构0.820 086.4889.12
关注流+运动流-Transformer架构0.846 889.1992.27
表 5  不同融合网络的实验结果对比
1 SOUCHET A D, LOURDEAUX D, PAGANI A, et al A narrative review of immersive virtual reality’s ergonomics and risks at the workplace: cybersickness, visual fatigue, muscular fatigue, acute stress, and mental overload[J]. Virtual Reality, 2023, 27 (1): 19- 50
2 VON MAMMEN S, KNOTE A, EDENHOFER S. Cyber sick but still having fun [C]// Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology. Munich: ACM, 2016: 325–326.
3 SHEN Z, SUN F, WANG Y, et al Research progress in physiological evaluation and treatment of visually induced motion sickness in virtual reality[J]. Acta Academiae Medicinae Sinicae, 2023, 45 (6): 980- 986
4 NG A K T, CHAN L K Y, LAU H Y K A study of cyber-sickness and sensory conflict theory using a motion-coupled virtual reality system[J]. Displays, 2020, 61: 101922
5 KIM H G, LIM H T, LEE S, et al VRSA net: VR sickness assessment considering exceptional motion for 360° VR video[J]. IEEE Transactions on Image Processing, 2019, 28 (4): 1646- 1660
6 KENNEDY R S, LANE N E, BERBAUM K S, et al Simulator sickness questionnaire: an enhanced method for quantifying simulator sickness[J]. The International Journal of Aviation Psychology, 1993, 3 (3): 203- 220
7 BRUCK S, WATTERS P A. Estimating cybersickness of simulated motion using the simulator sickness questionnaire (SSQ): a controlled study [C]// Proceedings of the International Conference on Computer Graphics. Tianjin: IEEE, 2009: 486–488.
8 ARAFAT I M, FERDOUS S M S, QUARLES J. The effects of cybersickness on persons with multiple sclerosis [C]// Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology. Munich: ACM, 2016: 51–59.
9 SEVINC V, BERKMAN M I Psychometric evaluation of simulator sickness questionnaire and its variants as a measure of cybersickness in consumer virtual environments[J]. Applied Ergonomics, 2020, 82: 102958
10 ISLAM R, DESAI K, QUARLES J. Towards forecasting the onset of cybersickness by fusing physiological, head-tracking and eye-tracking with multimodal deep fusion network [C]// Proceedings of the IEEE International Symposium on Mixed and Augmented Reality. Singapore: IEEE, 2022: 121–130.
11 SEONG S, PARK J Tracking motion sickness in dynamic VR environments with EDA signals[J]. International Journal of Industrial Ergonomics, 2024, 99: 103543
12 YANG A H X, KASABOV N K, CAKMAK Y O Prediction and detection of virtual reality induced cybersickness: a spiking neural network approach using spatiotemporal EEG brain data and heart rate variability[J]. Brain Informatics, 2023, 10 (1): 15
13 LI R, WANG Y, YIN H, et al. A deep cybersickness predictor through kinematic data with encoded physiological representation [C]// Proceedings of the IEEE International Symposium on Mixed and Augmented Reality. Sydney: IEEE, 2023: 1132–1141.
14 ISLAM R, DESAI K, QUARLES J. VR sickness prediction from integrated HMD’s sensors using multimodal deep fusion network [EB/OL]. (2021-08-14) [2024-10-22]. https://arxiv.org/abs/2108.06437.
15 SHIMADA S, PANNATTEE P, IKEI Y, et al High-frequency cybersickness prediction using deep learning techniques with eye-related indices[J]. IEEE Access, 2023, 11: 95825- 95839
16 YAO S H, FAN C L, HSU C H. Towards quality-of-experience models for watching 360° videos in head-mounted virtual reality [C]// Proceedings of the Eleventh International Conference on Quality of Multimedia Experience. Berlin: IEEE, 2019: 1–3.
17 QUAN W, LI L, HAN C, et al. Objective evaluation of VR sickness and analysis of its relationship with VR presence [C]// Proceedings of the International Conference on Intelligent Computing. Singapore: Springer, 2024: 416–427.
18 CAO Z, KOPPER R. Real-time viewport-aware optical flow estimation in 360-degree videos for visually-induced motion sickness mitigation [C]// Proceedings of the 25th Symposium on Virtual and Augmented Reality. Rio Grande: ACM, 2024: 210–218.
19 BALA P, DIONÍSIO D, NISI V, et al. Visually induced motion sickness in 360° videos: comparing and combining visual optimization techniques [C]// Proceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct. Munich: IEEE, 2018: 244–249.
20 KIM J, KIM W, AHN S, et al. Virtual reality sickness predictor: analysis of visual-vestibular conflict and VR contents [C]// Proceedings of the Tenth International Conference on Quality of Multimedia Experience. Cagliari: IEEE, 2018: 1–6.
21 LEE J, KIM W, KIM J, et al. A study on virtual reality sickness and visual attention [C]// Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Tokyo: IEEE, 2021: 1465–1469.
22 ZHAO J, TRAN K T P, CHALMERS A, et al. Deep learning-based simulator sickness estimation from 3D motion [C]// Proceedings of the IEEE International Symposium on Mixed and Augmented Reality. Sydney: IEEE, 2023: 39–48.
23 LU Z, YU M, JIANG G, et al Prediction of motion sickness degree of stereoscopic panoramic videos based on content perception and binocular characteristics[J]. Digital Signal Processing, 2023, 132: 103787
24 LEE T M, YOON J C, LEE I K Motion sickness prediction in stereoscopic videos using 3D convolutional neural networks[J]. IEEE Transactions on Visualization and Computer Graphics, 2019, 25 (5): 1919- 1927
25 DU M, CUI H, WANG Y, et al Learning from deep stereoscopic attention for simulator sickness prediction[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29 (2): 1415- 1423
26 权巍, 蔡永青, 王超, 等 基于3D-ResNet双流网络的VR病评估模型[J]. 浙江大学学报: 工学版, 2023, 57 (7): 1345- 1353
QUAN Wei, CAI Yongqing, WANG Chao, et al VR sickness estimation model based on 3D-ResNet two-stream network[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (7): 1345- 1353
27 ALEXEY D. An image is worth 16×16 words: Transformers for image recognition at scale [EB/OL]. (2021-06-03) [2024-10-22]. https://arxiv.org/abs/2010.11929.
28 FREMEREY S, SINGLA A, MESEBERG K, et al. AVtrack360: an open dataset and software recording people’s head rotations watching 360° videos on an HMD [C]// Proceedings of the 9th ACM Multimedia Systems Conference. Amsterdam: ACM, 2018: 403–408.
29 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. California: ACM, 2017: 6000–6010.
30 权巍, 王超, 耿雪娜, 等 基于运动感知的VR体验舒适度研究[J]. 系统仿真学报, 2023, 35 (1): 169- 177
QUAN Wei, WANG Chao, GENG Xuena, et al Research on VR experience comfort based on motion perception[J]. Journal of System Simulation, 2023, 35 (1): 169- 177
31 KUO P C, CHUANG L C, LIN D Y, et al. VR sickness assessment with perception prior and hybrid temporal features [C]// Proceedings of the 25th International Conference on Pattern Recognition. Milan: IEEE, 2021: 5558–5564.
32 PORCINO T, RODRIGUES E O, SILVA A, et al. Using the gameplay and user data to predict and identify causes of cybersickness manifestation in virtual reality games [C]// Proceedings of the IEEE 8th International Conference on Serious Games and Applications for Health. Vancouver: IEEE, 2020: 1–8.
33 YILDIRIM C. A review of deep learning approaches to EEG-based classification of cybersickness in virtual reality [C]// Proceedings of the IEEE International Conference on Artificial Intelligence and Virtual Reality. Utrecht: IEEE, 2020: 351–357.
34 LI Y, LIU A, DING L Machine learning assessment of visually induced motion sickness levels based on multiple biosignals[J]. Biomedical Signal Processing and Control, 2019, 49: 202- 211
[1] 李宗民,徐畅,白云,鲜世洋,戎光彩. 面向点云理解的双邻域图卷积方法[J]. 浙江大学学报(工学版), 2025, 59(5): 879-889.
[2] 陈赞,李冉,冯远静,李永强. 基于时间维超分辨率的视频快照压缩成像重构[J]. 浙江大学学报(工学版), 2025, 59(5): 956-963.
[3] 刘洪伟,王磊,刘阳,张鹏超,乔石. 基于重组二次分解及LSTNet-Atten的短期负荷预测[J]. 浙江大学学报(工学版), 2025, 59(5): 1051-1062.
[4] 马莉,王永顺,胡瑶,范磊. 预训练长短时空交错Transformer在交通流预测中的应用[J]. 浙江大学学报(工学版), 2025, 59(4): 669-678.
[5] 陈巧红,郭孟浩,方贤,孙麒. 基于跨模态级联扩散模型的图像描述方法[J]. 浙江大学学报(工学版), 2025, 59(4): 787-794.
[6] 顾正宇,赖菲菲,耿辰,王希明,戴亚康. 基于知识引导的缺血性脑卒中梗死区分割方法[J]. 浙江大学学报(工学版), 2025, 59(4): 814-820.
[7] 刘登峰,郭文静,陈世海. 基于内容引导注意力的车道线检测网络[J]. 浙江大学学报(工学版), 2025, 59(3): 451-459.
[8] 姚明辉,王悦燕,吴启亮,牛燕,王聪. 基于小样本人体运动行为识别的孪生网络算法[J]. 浙江大学学报(工学版), 2025, 59(3): 504-511.
[9] 梁礼明,龙鹏威,金家新,李仁杰,曾璐. 基于改进YOLOv8s的钢材表面缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(3): 512-522.
[10] 尹向雷,屈少鹏,解永芳,苏妮. 基于渐进特征融合及多尺度空洞注意力的遮挡鸟巢检测[J]. 浙江大学学报(工学版), 2025, 59(3): 535-545.
[11] 杨凯博,钟铭恩,谭佳威,邓智颖,周梦丽,肖子佶. 基于半监督学习的多场景火灾小规模稀薄烟雾检测[J]. 浙江大学学报(工学版), 2025, 59(3): 546-556.
[12] 薛雅丽,贺怡铭,崔闪,欧阳权. 基于改进YOLOv5的SAR图像有向舰船目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(2): 261-268.
[13] 陈智超,杨杰,李凡,冯志成. 基于深度学习的列车运行环境感知关键算法研究综述[J]. 浙江大学学报(工学版), 2025, 59(1): 1-17.
[14] 杨冰,徐楚阳,姚金良,向学勤. 基于单目RGB图像的三维手部姿态估计方法[J]. 浙江大学学报(工学版), 2025, 59(1): 18-26.
[15] 刘登峰,陈世海,郭文静,柴志雷. 基于轻量残差网络的高效半色调算法[J]. 浙江大学学报(工学版), 2025, 59(1): 62-69.