Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (4): 887-895    DOI: 10.3785/j.issn.1008-973X.2026.04.020
电子与信息工程     
基于频段特异性时空频融合的听觉注意力解码分析
王春丽(),高玉鑫,李金絮
兰州交通大学 电子与信息工程学院,甘肃 兰州 730000
Auditory attention decoding analysis based on band-specific spatial-temporal frequency fusion
Chunli WANG(),Yuxin GAO,Jinxu LI
School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730000, China
 全文: PDF(2409 KB)   HTML
摘要:

针对听觉注意力检测方法忽视脑电信号频段特异性贡献的问题,提出时空频特征融合框架(SSF-ConvLSTM). 该框架系统评估$\delta $(1~4 Hz)、$\theta $(4~8 Hz)、$\alpha $(8~13 Hz)、$\beta $(13~30 Hz)和$\gamma $(30~50 Hz)频段的差异化贡献,从而实现关键频段的定量筛选与动态耦合建模. 通过脑地形图揭示不同频段神经活动的空间权重分布,筛选与目标语音编码密切相关的频段. 构建SSF-ConvLSTM模型,通过卷积层提取频带空间特征,并融合长短时记忆网络(LSTM)对注意力时变动态的建模能力,从而实现跨频段时空动态特征的联合解码. 在公开KUL和DTU数据集上进行算法验证,结果表明:随着频率不断增高,与听觉注意解码相关的额叶和颞叶权重在$ \alpha$频带达到峰值,随后至$\gamma $频带逐渐降低;在KUL数据集上,低频带$\alpha $在5 s决策窗口具有最优解码精确度93.38%,较基线模型提高了9.78个百分点;在DTU数据集上,$\alpha $频带解码精度较基线模型显著提高5.5个百分点. 本研究证实了频段特异性特征对AAD解码的关键作用,为开发基于特征优化的新型频段-时空耦合脑机接口提供了理论依据.

关键词: 脑电图 (EEG)听觉注意力检测 (AAD)频带分析解码精度鸡尾酒会效应    
Abstract:

A spatial-temporal frequency feature fusion framework (SSF-ConvLSTM) was proposed, in order to address the issue that auditory attention detection methods ignore the band-specific contribution of EEG signals. This framework systematically evaluated the differentiated contributions of the $\delta $(1~4 Hz), $\theta $(4~8 Hz), $\alpha $(8~13 Hz), $\beta $(13~30 Hz), and $\gamma $(30~50 Hz) frequency bands, thereby achieving quantitative screening and dynamic coupling modeling of key frequency bands. Firstly, the spatial weight distribution of neural activities in different frequency bands was revealed through brain topographic maps, thereby screening those closely related to target speech encoding. Secondly, the SSF-ConvLSTM model was constructed. The spatial features of the frequency bands were extracted through convolutional layers, and the modeling ability of the Long Short-Term Memory (LSTM) network for the time-varying dynamics of attention was integrated, thereby enabling the joint decoding of spatial-temporal dynamic features across frequency bands. The algorithm was verified on the public KUL and DTU datasets. The results showed that as the frequency continuously increased, the weights of the frontal and temporal lobes related to auditory attention decoding reached their peak in the $\alpha $ band and then gradually decreased in the $\gamma $ band. Through model analysis on the KUL dataset, the $\alpha $ low-frequency band had the optimal decoding accuracy of 93.38% in the 5-second decision window, which was 9.78 percentage points higher than that of the baseline model. On the DTU dataset, the decoding accuracy of the α band was significantly improved by 5.5 percentage points compared with the baseline model. This study confirmed the key role of band-specific features in AAD decoding, thereby providing a theoretical basis for the development of a new type of band-spatial-temporal coupled brain-computer interface based on feature optimization.

Key words: electroencephalogram (EEG)    auditory attention detection (AAD)    band analysis    decoding accuracy    cocktail party effect
收稿日期: 2025-04-26 出版日期: 2026-03-19
CLC:  TP 312  
基金资助: 兰州交通大学-天津大学高校联合创新基金资助项目(LH2023002);天津市自然科学基金资助项目(21JCZXJC00190).
作者简介: 王春丽(1981—),女,副教授,从事脑电听觉注意力检测研究. orcid.org/0009-0000-8095-4485. E-mail:gyx12172105@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
王春丽
高玉鑫
李金絮

引用本文:

王春丽,高玉鑫,李金絮. 基于频段特异性时空频融合的听觉注意力解码分析[J]. 浙江大学学报(工学版), 2026, 60(4): 887-895.

Chunli WANG,Yuxin GAO,Jinxu LI. Auditory attention decoding analysis based on band-specific spatial-temporal frequency fusion. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 887-895.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.04.020        https://www.zjujournals.com/eng/CN/Y2026/V60/I4/887

图 1  SSF-ConvLSTM框架图
数据集N言语刺激方向ts/minT/h
KUL16佛兰德语90°左和90°右4812.8
DTU18丹麦语60°左和60°右5015.0
表 1  数据集详细信息表
层名dindoutn
三维卷积层5×32×32×15×32×32×32320
批归一化层5×32×32×325×32×32×32128
卷积长短期记忆5×32×32×3232×32×3273856
压缩层32×32×32327680
全连接层327685121677728
批归一化层5125122048
全连接层5123216416
归一化指数函数32266
表 2  模块详细配置参数表
图 2  KUL数据集上的脑地形图分布
图 3  DTU数据集上的脑地形图分布
图 4  KUL数据集不同频带解码精度图
图 5  DTU数据集不同频带解码精度图
图 6  不同频带的解码精度趋势图
图 7  $\alpha $频段结果分析图
1 HAN C, O’SULLIVAN J, LUO Y, et al Speaker-independent auditory attention decoding without access to clean speech sources[J]. Science Advances, 2019, 5 (5): eaav6134
doi: 10.1126/sciadv.aav6134
2 MONESI M J, ACCOU B, MONTOYA-MARTINEZ J, et al. An LSTM based architecture to relate speech stimulus to eeg [C]// ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 941–945.
3 CAI S, SU E, XIE L, et al EEG-based auditory attention detection via frequency and channel neural attention[J]. IEEE Transactions on Human-Machine Systems, 2021, 52 (2): 256- 266
4 HWANG H J, KIM S, CHOI S, et al EEG-based brain-computer interfaces: a thorough literature survey[J]. International Journal of Human-Computer Interaction, 2013, 29 (12): 814- 826
doi: 10.1080/10447318.2013.780869
5 CEOLINI E, HJORTKJÆR J, WONG D D E, et al Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception[J]. NeuroImage, 2020, 223: 117282
doi: 10.1016/j.neuroimage.2020.117282
6 DE TAILLEZ T, KOLLMEIER B, MEYER B T Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech[J]. European Journal of Neuroscience, 2020, 51 (5): 1234- 1241
doi: 10.1111/ejn.13790
7 THORNTON M, MANDIC D, REICHENBACH T Robust decoding of the speech envelope from EEG recordings through deep neural networks[J]. Journal of Neural Engineering, 2022, 19 (4): 046007
doi: 10.1088/1741-2552/ac7976
8 SOMERS B, VERSCHUEREN E, FRANCART T Neural tracking of the speech envelope in cochlear implant users[J]. Journal of Neural Engineering, 2019, 16 (1): 016003
doi: 10.1088/1741-2552/aae6b9
9 WANG L, WU E X, CHEN F EEG-based auditory attention decoding using speech-level-based segmented computational models[J]. Journal of Neural Engineering, 2021, 18 (4): 046066
doi: 10.1088/1741-2552/abfeba
10 XU Z, BAI Y, ZHAO R, et al Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction[J]. Hearing Research, 2022, 422: 108552
doi: 10.1016/j.heares.2022.108552
11 ZHU H, CAI S, JIANG Y, et al. EEG-derived voice signature for attended speaker detection [EB/OL]. (2023−08−29) [2025−12−25]. https://doi.org/10.48550/arXiv.2308.14774.
12 王春丽, 李金絮, 高玉鑫, 等 一种基于时空频多维特征的短时窗口脑电听觉注意解码网络[J]. 电子与信息学报, 2025, 47 (3): 814- 824
WANG Chunli, LI Jinxu, GAO Yuxin, et al A short-time window ElectroEncephaloGram auditory attention decoding network based on multi-dimensional characteristics of temporal-spatial-frequency[J]. Journal of Electronics and Information Technology, 2025, 47 (3): 814- 824
doi: 10.11999/JEIT240867
13 CAI S, SUN P, SCHULTZ T, et al. Low-latency auditory spatial attention detection based on spectro-spatial features from EEG [C]// 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Mexico: IEEE, 2021: 5812-5815.
14 XIE Z, WEI J, LU W, et al. EEG-based fast auditory attention detection in real-life scenarios using time-frequency attention mechanism [C]// 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul: IEEE, 2024: 1741–1745.
15 JIANG Z, AN X, LIU S, et al Neural oscillations reflect the individual differences in the temporal perception of audiovisual speech[J]. Cerebral Cortex, 2023, 33 (20): 10575- 10583
doi: 10.1093/cercor/bhad304
16 FRIESE U, DAUME J, GÖSCHL F, et al Oscillatory brain activity during multisensory attention reflects activation, disinhibition, and cognitive control[J]. Scientific Reports, 2016, 6: 32775
doi: 10.1038/srep32775
17 POPOV T, KASTNER S, JENSEN O FEF-controlled alpha delay activity precedes stimulus-induced gamma-band activity in visual cortex[J]. The Journal of Neuroscience, 2017, 37 (15): 4117- 4127
doi: 10.1523/JNEUROSCI.3015-16.2017
18 LIU Y J, YU M, ZHAO G, et al Real-time movie-induced discrete emotion recognition from EEG signals[J]. IEEE Transactions on Affective Computing, 2018, 9 (4): 550- 562
doi: 10.1109/TAFFC.2017.2660485
19 KAUSAR T, LU Y, ASGHAR M A, et al Auditory-GAN: deep learning framework for improved auditory spatial attention detection[J]. PeerJ Computer Science, 2024, 10: e2394
doi: 10.7717/peerj-cs.2394
20 DAS N, FRANCART T, BERTRAND A Auditory attention detection dataset KUL euven[J]. Zenodo, 2020, 25 (5): 402- 412
21 GEIRNAERT S, FRANCART T, BERTRAND A Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns[J]. IEEE Transactions on Bio-Medical Engineering, 2021, 68 (5): 1557- 1568
doi: 10.1109/TBME.2020.3033446
22 FUGLSANG S A, DAU T, HJORTKJÆR J Noise-robust cortical tracking of attended speech in real-world acoustic scenes[J]. NeuroImage, 2017, 156: 435- 444
doi: 10.1016/j.neuroimage.2017.04.026
23 FUGLSANG S A, WONG D D E, HJORTKJAER J. EEG audio dataset for auditory attention decoding [EB/OL]. [2025−12−25]. https://doi.org/10.5281/zenodo. 1199011.
24 CICCARELLI G, NOLAN M, PERRICONE J, et al Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods[J]. Scientific Reports, 2019, 9: 11538
doi: 10.1038/s41598-019-47795-0
25 ZHANG Z, ZHANG G, DANG J, et al. EEG-based short-time auditory attention detection using multi-task deep learning [C]// Interspeech 2020. ISCA: 2020: 2517-2521.
26 VANDECAPPELLE S, DECKERS L, DAS N, et al EEG-based detection of the locus of auditory attention with convolutional neural networks[J]. eLife, 2021, 10: e56481
doi: 10.7554/eLife.56481
27 FU Z, WANG B, WU X, et al. Auditory attention decoding from EEG using convolutional recurrent neural network [C]// 29th European Signal Processing Conference. Dublin: IEEE, 2021: 970–974.
28 ZION GOLUMBIC E M, DING N, BICKEL S, et al Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”[J]. Neuron, 2013, 77 (5): 980- 991
doi: 10.1016/j.neuron.2012.12.037
29 MIZOKUCHI K, TANAKA T, SATO T G, et al Alpha band modulation caused by selective attention to music enables EEG classification[J]. Cognitive Neurodynamics, 2024, 18 (3): 1005- 1020
doi: 10.1007/s11571-023-09955-x
30 SU E, CAI S, LI P, et al. Auditory attention detection with EEG channel attention [C]// 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Mexico: IEEE, 2021: 5804−5807.
[1] 任迪, 万健, 殷昱煜, 周丽, 高敏. 基于贝叶斯分类的Web服务质量预测方法研究[J]. 浙江大学学报(工学版), 2017, 51(6): 1242-1251.
[2] 朱文峤,刁常宇,许端清,鲁东明. 基于连续对称视差计算的多视图立体匹配[J]. J4, 2014, 48(1): 85-91.
[3] 杨鑫,许端清,杨冰. 基于不规则性的并行计算方法[J]. J4, 2013, 47(11): 2057-2064.
[4] 杨鑫, 许端清, 赵磊, 杨冰. 二级光线跟踪的并行计算[J]. J4, 2012, 46(10): 1796-1802.
[5] 杨鑫 , 王天明, 许端清. 基于GPU的层次包围盒快速构造方法[J]. J4, 2012, 46(1): 84-89.