基于多尺度注意力时序编码网络的语音诱发脑电解码

doi:10.3785/j.issn.1008-973X.2026.04.021

浙江大学学报(工学版)

2026, Vol. 60

Issue (4): 896-905 DOI: 10.3785/j.issn.1008-973X.2026.04.021

电子与信息工程

基于多尺度注意力时序编码网络的语音诱发脑电解码

姚梓豪(

),贾海蓉*(

),李雅荣,陈桂军

太原理工大学电子信息工程学院，山西太原 030024

Speech-evoked EEG decoding based on Multi-scale Attention Temporal Encoding Network

Zihao YAO(

),Hairong JIA*(

),Yarong LI,Guijun CHEN

College of Electronic and Information Engineering, Taiyuan University of Technology, Taiyuan 030024, China

全文: PDF(1844 KB) HTML

摘要：

针对诱发隐性语音（无声/想象语音）的脑电信号特征复杂且数据获取困难的问题，提出多尺度注意力时序编码网络（MATE-Net）,利用相对丰富的显性语音数据训练模型，应用于隐性语音解码任务. 模型通过Inception多感受野模块提取多尺度特征；引入双向GRU结构有效捕获前后文依赖关系，增强对时序动态的表征能力；为了解决深层网络训练问题，加入残差连接机制，确保梯度在反向传播过程中的稳定性；引入多头注意力机制以有效捕捉局部与全局时序依赖，增强关键特征的表达. 实验结果表明，本模型在显性语音解码任务中展现出良好的性能表现. 在五折交叉验证中，测试集的平均准确率达到 74.30%，且 Spearman 相关系数和 Pearson 相关系数分别为 0.884 与 0.942. MATE-Net的预训练模型能够成功应用于无声语音及想象语音任务，实现语音频谱的有效重构.

关键词： 脑机接口; 脑电图（EEG）; 显性语音; 无声语音; 想象语音

Abstract:

A Multi-scale Attention Temporal Encoding Network (MATE-Net) was proposed to address the issues of complex EEG signal features and difficulty in acquiring elicited covert speech data (whispered and imagined). The relatively abundant overt speech data was leveraged to train the model, which was then applied to covert speech decoding tasks. An Inception-based multi-receptive field module was utilized to extract multi-scale features from the input signals, while a bidirectional GRU architecture was employed to capture contextual dependencies and improve the representation of temporal dynamics. To tackle the training issues of deep networks, a residual connection mechanism was added to ensure robust gradient flow during backpropagation. Moreover, a multi-head attention mechanism was introduced to effectively capture both local and global temporal dependencies, thereby strengthening the representation of salient features in the sequence. Experimental results showed that the model achieved excellent performance in overt speech decoding, with an average accuracy of 74.30% on the test set and Spearman and Pearson correlation coefficients of 0.884 and 0.942, respectively, in five-fold cross-validation. The pre-trained MATE-Net was successfully applied to whispered and imagined speech tasks, enabling effective reconstruction of speech spectrograms.

Key words: brain-computer interface electroencephalography (EEG) overt speech whispered speech imagined speech

收稿日期: 2025-04-21 出版日期: 2026-03-19

CLC:

TN 911.7

基金资助: 国家自然科学基金资助项目（62201377）；山西省基础研究计划资助项目（202403021211098）；山西省研究生创新项目（RC2400005582）.

通讯作者: 贾海蓉 E-mail: 1078349047@qq.com;helenjia722@163.com

作者简介: 姚梓豪（2000—），男，硕士生，从事语音信号研究. orcid.org/0009-0008-2621-0766. E-mail：1078349047@qq.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	姚梓豪
	贾海蓉
	李雅荣
	陈桂军

引用本文:

姚梓豪,贾海蓉,李雅荣,陈桂军. 基于多尺度注意力时序编码网络的语音诱发脑电解码[J]. 浙江大学学报(工学版), 2026, 60(4): 896-905.

Zihao YAO,Hairong JIA,Yarong LI,Guijun CHEN. Speech-evoked EEG decoding based on Multi-scale Attention Temporal Encoding Network. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 896-905.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.04.021 或 https://www.zjujournals.com/eng/CN/Y2026/V60/I4/896

图 1 语音诱发脑电解码的整体流程

图 2 多尺度注意力时序编码网络（MATE-Net）的架构

表 1 实验软硬件环境配置参数

表 2 五折交叉验证的实验结果

表 3 MATE-Net 关键模块对模型性能影响的消融实验结果

表 4 MATE-Net 与现有主流解码方法的性能对比

图 3 原始频谱图和重构频谱图对比

图 4 单词级的原始梅尔频谱图和重构梅尔频谱图对比

图 5 原始语音波形和重构语音波形对比

表 5 跨数据集对比实验结果

表 6 显性语音与隐性语音的相关性

图 6 显性语音与隐性语音的频谱和波形对比

1	HILARI K, NEEDLE J J, HARRISON K L What are the important factors in health-related quality of life for people with aphasia? a systematic review[J]. Archives of Physical Medicine and Rehabilitation, 2012, 93 (1): S86- S95 doi: 10.1016/j.apmr.2011.05.028
2	JELLINGER K A The spectrum of cognitive dysfunction in amyotrophic lateral sclerosis: an update[J]. International Journal of Molecular Sciences, 2023, 24 (19): 14647 doi: 10.3390/ijms241914647
3	刘近贞, 叶方方, 熊慧基于卷积神经网络的多类运动想象脑电信号识别[J]. 浙江大学学报: 工学版, 2021, 55 (11): 2054- 2066 LIU Jinzhen, YE Fangfang, XIONG Hui Recognition of multi class motor imagery EEG signals based on convolutional neural network[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (11): 2054- 2066
4	TANG J Effect of brain-computer interface training on functional recovery after stroke[J]. Theoretical and Natural Science, 2023, 21 (1): 75- 79 doi: 10.54254/2753-8818/21/20230821
5	LUO S, ANGRICK M, COOGAN C, et al Stable decoding from a speech BCI enables control for an individual with ALS without recalibration for 3 months[J]. Advanced Science, 2023, 10 (35): e2304853 doi: 10.1002/advs.202304853
6	ANUMANCHIPALLI G K, CHARTIER J, CHANG E F Speech synthesis from neural decoding of spoken sentences[J]. Nature, 2019, 568 (7753): 493- 498 doi: 10.1038/s41586-019-1119-1
7	LI M, LIAO S, PUN S H, et al. Effects of EEG analysis window location on classifying spoken mandarin monosyllables [C]// 11th International IEEE/EMBS Conference on Neural Engineering. Baltimore: IEEE, 2023: 1–4.
8	MELINDA M, JUWONO F H, ENRIKO I K A, et al Application of continuous wavelet transform and support vector machine for autism spectrum disorder electroencephalography signal classification[J]. Radioelectronic and Computer Systems, 2023, (3): 73- 90 doi: 10.32620/reks.2023.3.07
9	LIU H, ONG Y S, YU Z, et al Scalable Gaussian process classification with additive noise for non-Gaussian likelihoods[J]. IEEE Transactions on Cybernetics, 2022, 52 (7): 5842- 5854 doi: 10.1109/TCYB.2020.3043355
10	ALENAZI F S, EL HINDI K, ASSADHAN B Complement-class harmonized Naïve Bayes classifier[J]. Applied Sciences, 2023, 13 (8): 4852 doi: 10.3390/app13084852
11	ABDULGHANI M M, WALTERS W L, ABED K H Imagined speech classification using EEG and deep learning[J]. Bioengineering, 2023, 10 (6): 649 doi: 10.3390/bioengineering10060649
12	QI H, GAO N. Research on the classification algorithm of imaginary speech EEG signals based on twin neural network [C]// 7th International Conference on Signal and Image Processing. Suzhou: IEEE, 2022: 211–216.
13	GASPARINI F, CAZZANIGA E, SAIBENE A. Inner speech recognition through electroencephalographic signals [EB/OL]. (2022–10–11) [2025–04–21]. https://arxiv.org/abs/2210.06472.
14	PARK H J, LEE B Multiclass classification of imagined speech EEG using noise-assisted multivariate empirical mode decomposition and multireceptive field convolutional neural network[J]. Frontiers in Human Neuroscience, 2023, 17: 1186594 doi: 10.3389/fnhum.2023.1186594
15	VORONTSOVA D, MENSHIKOV I, ZUBOV A, et al Silent EEG-speech recognition using convolutional and recurrent neural network with 85% accuracy of 9 words classification[J]. Sensors, 2021, 21 (20): 6744 doi: 10.3390/s21206744
16	CHEN X, WANG R, KHALILIAN-GOURTANI A, et al A neural speech decoding framework leveraging deep learning and speech synthesis[J]. Nature Machine Intelligence, 2024, 6 (4): 467- 480 doi: 10.1038/s42256-024-00824-8
17	MARTIN S, BRUNNER P, ITURRATE I, et al Word pair classification during imagined speech using direct brain recordings[J]. Scientific Reports, 2016, 6: 25803 doi: 10.1038/srep25803
18	KOMEIJI S, MITSUHASHI T, IIMURA Y, et al Feasibility of decoding covert speech in ECoG with a Transformer trained on overt speech[J]. Scientific Reports, 2024, 14: 11491 doi: 10.1038/s41598-024-62230-9
19	CANOLTY R T, EDWARDS E, DALAL S S, et al High gamma power is phase-locked to theta oscillations in human neocortex[J]. Science, 2006, 313 (5793): 1626- 1628 doi: 10.1126/science.1128115
20	ABDI H, WILLIAMS L J Principal component analysis[J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2 (4): 433- 459 doi: 10.1002/wics.101
21	SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1–9.
22	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
23	DEY R, SALEM F M. Gate-variants of gated recurrent unit (GRU) neural networks [C]// IEEE 60th International Midwest Symposium on Circuits and Systems. Boston: IEEE, 2017: 1597–1600.
24	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30.
25	SRIVASTAVA N, HINTON G E, KRIZHEVSKY A, et al Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2020, 15: 1929- 1958
26	ANGRICK M, OTTENHOFF M C, DIENER L, et al Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity[J]. Communications Biology, 2021, 4: 1055 doi: 10.1038/s42003-021-02578-0
27	LAWHERN V J, SOLON A J, WAYTOWICH N R, et al EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces[J]. Journal of Neural Engineering, 2018, 15 (5): 056013 doi: 10.1088/1741-2552/aace8c
28	SALAMI A, ANDREU-PEREZ J, GILLMEISTER H EEG-ITNet: an explainable inception temporal convolutional network for motor imagery classification[J]. IEEE Access, 2022, 10: 36672- 36685 doi: 10.1109/ACCESS.2022.3161489
29	LEE Y E, LEE S H. EEG-transformer: self-attention from transformer architecture for decoding EEG of imagined speech [C]// 10th International Winter Conference on Brain-Computer Interface. Gangwon-do: IEEE, 2022: 1–4.
30	INGOLFSSON T M, HERSCHE M, WANG X, et al. EEG-TCNet: an accurate temporal convolutional network for embedded motor-imagery brain-machine interfaces [C]// IEEE International Conference on Systems, Man, and Cybernetics. Toronto: IEEE, 2020: 2958–2965.
31	WAN Z, LI M, LIU S, et al EEGformer: a transformer-based brain activity classification method using EEG signal[J]. Frontiers in Neuroscience, 2023, 17: 1148855 doi: 10.3389/fnins.2023.1148855

[1]	董镇滔,徐暟敏,万清颖,刘晓菲,申昊,李书涵,奇格奇. 基于交通事件短视频资源的多模态情绪特征分析[J]. 浙江大学学报(工学版), 2025, 59(4): 661-668.
[2]	李宜轩,李颖,肖倩,王灵月,尹宁,杨硕. 不同情绪错误记忆的脑电微状态功能网络分析[J]. 浙江大学学报(工学版), 2025, 59(1): 49-61.
[3]	吴书晗,王丹,陈远方,贾子钰,张越棋,许萌. 融合注意力的滤波器组双视图图卷积运动想象脑电分类[J]. 浙江大学学报(工学版), 2024, 58(7): 1326-1335.
[4]	刘近贞,叶方方,熊慧. 基于卷积神经网络的多类运动想象脑电信号识别[J]. 浙江大学学报(工学版), 2021, 55(11): 2054-2066.
[5]	童基均, 李琳, 林勤光, 朱丹华. 采用平滑伪Wigner-Ville分布的SSVEP脑机接口系统[J]. 浙江大学学报(工学版), 2017, 51(3): 598-604.
[6]	杨帮华,韩志军,王倩,何亮飞. 分形维数结合RLS-ICA的脑电信号消噪[J]. 浙江大学学报(工学版), 2014, 48(7): 1234-1240.
[7]	杨帮华, 何美燕, 刘丽, 陆文宇. 脑机接口中基于BISVM的EEG分类[J]. J4, 2013, 47(8): 1431-1436.
[8]	施锦河, 沈继忠, 王攀. 四类运动想象脑电信号特征提取与分类算法[J]. J4, 2012, 46(2): 338-344.
[9]	张韶岷, 陈卫东, 孙超, 等. 动物脑电-行为同步记录及分析系统[J]. J4, 2009, 43(11): 2028-2033.

Viewed

Full text

Abstract

Cited

Shared

Discussed