|
|
|
| Speech-evoked EEG decoding based on Multi-scale Attention Temporal Encoding Network |
Zihao YAO( ),Hairong JIA*( ),Yarong LI,Guijun CHEN |
| College of Electronic and Information Engineering, Taiyuan University of Technology, Taiyuan 030024, China |
|
|
|
Abstract A Multi-scale Attention Temporal Encoding Network (MATE-Net) was proposed to address the issues of complex EEG signal features and difficulty in acquiring elicited covert speech data (whispered and imagined). The relatively abundant overt speech data was leveraged to train the model, which was then applied to covert speech decoding tasks. An Inception-based multi-receptive field module was utilized to extract multi-scale features from the input signals, while a bidirectional GRU architecture was employed to capture contextual dependencies and improve the representation of temporal dynamics. To tackle the training issues of deep networks, a residual connection mechanism was added to ensure robust gradient flow during backpropagation. Moreover, a multi-head attention mechanism was introduced to effectively capture both local and global temporal dependencies, thereby strengthening the representation of salient features in the sequence. Experimental results showed that the model achieved excellent performance in overt speech decoding, with an average accuracy of 74.30% on the test set and Spearman and Pearson correlation coefficients of 0.884 and 0.942, respectively, in five-fold cross-validation. The pre-trained MATE-Net was successfully applied to whispered and imagined speech tasks, enabling effective reconstruction of speech spectrograms.
|
|
Received: 21 April 2025
Published: 19 March 2026
|
|
|
| Fund: 国家自然科学基金资助项目(62201377);山西省基础研究计划资助项目(202403021211098);山西省研究生创新项目(RC2400005582). |
|
Corresponding Authors:
Hairong JIA
E-mail: 1078349047@qq.com;helenjia722@163.com
|
基于多尺度注意力时序编码网络的语音诱发脑电解码
针对诱发隐性语音(无声/想象语音)的脑电信号特征复杂且数据获取困难的问题,提出多尺度注意力时序编码网络(MATE-Net),利用相对丰富的显性语音数据训练模型,应用于隐性语音解码任务. 模型通过Inception多感受野模块提取多尺度特征;引入双向GRU结构有效捕获前后文依赖关系,增强对时序动态的表征能力;为了解决深层网络训练问题,加入残差连接机制,确保梯度在反向传播过程中的稳定性;引入多头注意力机制以有效捕捉局部与全局时序依赖,增强关键特征的表达. 实验结果表明,本模型在显性语音解码任务中展现出良好的性能表现. 在五折交叉验证中,测试集的平均准确率达到 74.30%,且 Spearman 相关系数和 Pearson 相关系数分别为 0.884 与 0.942. MATE-Net的预训练模型能够成功应用于无声语音及想象语音任务,实现语音频谱的有效重构.
关键词:
脑机接口,
脑电图(EEG),
显性语音,
无声语音,
想象语音
|
|
| [1] |
HILARI K, NEEDLE J J, HARRISON K L What are the important factors in health-related quality of life for people with aphasia? a systematic review[J]. Archives of Physical Medicine and Rehabilitation, 2012, 93 (1): S86- S95
doi: 10.1016/j.apmr.2011.05.028
|
|
|
| [2] |
JELLINGER K A The spectrum of cognitive dysfunction in amyotrophic lateral sclerosis: an update[J]. International Journal of Molecular Sciences, 2023, 24 (19): 14647
doi: 10.3390/ijms241914647
|
|
|
| [3] |
刘近贞, 叶方方, 熊慧 基于卷积神经网络的多类运动想象脑电信号识别[J]. 浙江大学学报: 工学版, 2021, 55 (11): 2054- 2066 LIU Jinzhen, YE Fangfang, XIONG Hui Recognition of multi class motor imagery EEG signals based on convolutional neural network[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (11): 2054- 2066
|
|
|
| [4] |
TANG J Effect of brain-computer interface training on functional recovery after stroke[J]. Theoretical and Natural Science, 2023, 21 (1): 75- 79
doi: 10.54254/2753-8818/21/20230821
|
|
|
| [5] |
LUO S, ANGRICK M, COOGAN C, et al Stable decoding from a speech BCI enables control for an individual with ALS without recalibration for 3 months[J]. Advanced Science, 2023, 10 (35): e2304853
doi: 10.1002/advs.202304853
|
|
|
| [6] |
ANUMANCHIPALLI G K, CHARTIER J, CHANG E F Speech synthesis from neural decoding of spoken sentences[J]. Nature, 2019, 568 (7753): 493- 498
doi: 10.1038/s41586-019-1119-1
|
|
|
| [7] |
LI M, LIAO S, PUN S H, et al. Effects of EEG analysis window location on classifying spoken mandarin monosyllables [C]// 11th International IEEE/EMBS Conference on Neural Engineering. Baltimore: IEEE, 2023: 1–4.
|
|
|
| [8] |
MELINDA M, JUWONO F H, ENRIKO I K A, et al Application of continuous wavelet transform and support vector machine for autism spectrum disorder electroencephalography signal classification[J]. Radioelectronic and Computer Systems, 2023, (3): 73- 90
doi: 10.32620/reks.2023.3.07
|
|
|
| [9] |
LIU H, ONG Y S, YU Z, et al Scalable Gaussian process classification with additive noise for non-Gaussian likelihoods[J]. IEEE Transactions on Cybernetics, 2022, 52 (7): 5842- 5854
doi: 10.1109/TCYB.2020.3043355
|
|
|
| [10] |
ALENAZI F S, EL HINDI K, ASSADHAN B Complement-class harmonized Naïve Bayes classifier[J]. Applied Sciences, 2023, 13 (8): 4852
doi: 10.3390/app13084852
|
|
|
| [11] |
ABDULGHANI M M, WALTERS W L, ABED K H Imagined speech classification using EEG and deep learning[J]. Bioengineering, 2023, 10 (6): 649
doi: 10.3390/bioengineering10060649
|
|
|
| [12] |
QI H, GAO N. Research on the classification algorithm of imaginary speech EEG signals based on twin neural network [C]// 7th International Conference on Signal and Image Processing. Suzhou: IEEE, 2022: 211–216.
|
|
|
| [13] |
GASPARINI F, CAZZANIGA E, SAIBENE A. Inner speech recognition through electroencephalographic signals [EB/OL]. (2022–10–11) [2025–04–21]. https://arxiv.org/abs/2210.06472.
|
|
|
| [14] |
PARK H J, LEE B Multiclass classification of imagined speech EEG using noise-assisted multivariate empirical mode decomposition and multireceptive field convolutional neural network[J]. Frontiers in Human Neuroscience, 2023, 17: 1186594
doi: 10.3389/fnhum.2023.1186594
|
|
|
| [15] |
VORONTSOVA D, MENSHIKOV I, ZUBOV A, et al Silent EEG-speech recognition using convolutional and recurrent neural network with 85% accuracy of 9 words classification[J]. Sensors, 2021, 21 (20): 6744
doi: 10.3390/s21206744
|
|
|
| [16] |
CHEN X, WANG R, KHALILIAN-GOURTANI A, et al A neural speech decoding framework leveraging deep learning and speech synthesis[J]. Nature Machine Intelligence, 2024, 6 (4): 467- 480
doi: 10.1038/s42256-024-00824-8
|
|
|
| [17] |
MARTIN S, BRUNNER P, ITURRATE I, et al Word pair classification during imagined speech using direct brain recordings[J]. Scientific Reports, 2016, 6: 25803
doi: 10.1038/srep25803
|
|
|
| [18] |
KOMEIJI S, MITSUHASHI T, IIMURA Y, et al Feasibility of decoding covert speech in ECoG with a Transformer trained on overt speech[J]. Scientific Reports, 2024, 14: 11491
doi: 10.1038/s41598-024-62230-9
|
|
|
| [19] |
CANOLTY R T, EDWARDS E, DALAL S S, et al High gamma power is phase-locked to theta oscillations in human neocortex[J]. Science, 2006, 313 (5793): 1626- 1628
doi: 10.1126/science.1128115
|
|
|
| [20] |
ABDI H, WILLIAMS L J Principal component analysis[J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2 (4): 433- 459
doi: 10.1002/wics.101
|
|
|
| [21] |
SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1–9.
|
|
|
| [22] |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
|
|
|
| [23] |
DEY R, SALEM F M. Gate-variants of gated recurrent unit (GRU) neural networks [C]// IEEE 60th International Midwest Symposium on Circuits and Systems. Boston: IEEE, 2017: 1597–1600.
|
|
|
| [24] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30.
|
|
|
| [25] |
SRIVASTAVA N, HINTON G E, KRIZHEVSKY A, et al Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2020, 15: 1929- 1958
|
|
|
| [26] |
ANGRICK M, OTTENHOFF M C, DIENER L, et al Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity[J]. Communications Biology, 2021, 4: 1055
doi: 10.1038/s42003-021-02578-0
|
|
|
| [27] |
LAWHERN V J, SOLON A J, WAYTOWICH N R, et al EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces[J]. Journal of Neural Engineering, 2018, 15 (5): 056013
doi: 10.1088/1741-2552/aace8c
|
|
|
| [28] |
SALAMI A, ANDREU-PEREZ J, GILLMEISTER H EEG-ITNet: an explainable inception temporal convolutional network for motor imagery classification[J]. IEEE Access, 2022, 10: 36672- 36685
doi: 10.1109/ACCESS.2022.3161489
|
|
|
| [29] |
LEE Y E, LEE S H. EEG-transformer: self-attention from transformer architecture for decoding EEG of imagined speech [C]// 10th International Winter Conference on Brain-Computer Interface. Gangwon-do: IEEE, 2022: 1–4.
|
|
|
| [30] |
INGOLFSSON T M, HERSCHE M, WANG X, et al. EEG-TCNet: an accurate temporal convolutional network for embedded motor-imagery brain-machine interfaces [C]// IEEE International Conference on Systems, Man, and Cybernetics. Toronto: IEEE, 2020: 2958–2965.
|
|
|
| [31] |
WAN Z, LI M, LIU S, et al EEGformer: a transformer-based brain activity classification method using EEG signal[J]. Frontiers in Neuroscience, 2023, 17: 1148855
doi: 10.3389/fnins.2023.1148855
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|