Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (4): 887-895    DOI: 10.3785/j.issn.1008-973X.2026.04.020
    
Auditory attention decoding analysis based on band-specific spatial-temporal frequency fusion
Chunli WANG(),Yuxin GAO,Jinxu LI
School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730000, China
Download: HTML     PDF(2409KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A spatial-temporal frequency feature fusion framework (SSF-ConvLSTM) was proposed, in order to address the issue that auditory attention detection methods ignore the band-specific contribution of EEG signals. This framework systematically evaluated the differentiated contributions of the $\delta $(1~4 Hz), $\theta $(4~8 Hz), $\alpha $(8~13 Hz), $\beta $(13~30 Hz), and $\gamma $(30~50 Hz) frequency bands, thereby achieving quantitative screening and dynamic coupling modeling of key frequency bands. Firstly, the spatial weight distribution of neural activities in different frequency bands was revealed through brain topographic maps, thereby screening those closely related to target speech encoding. Secondly, the SSF-ConvLSTM model was constructed. The spatial features of the frequency bands were extracted through convolutional layers, and the modeling ability of the Long Short-Term Memory (LSTM) network for the time-varying dynamics of attention was integrated, thereby enabling the joint decoding of spatial-temporal dynamic features across frequency bands. The algorithm was verified on the public KUL and DTU datasets. The results showed that as the frequency continuously increased, the weights of the frontal and temporal lobes related to auditory attention decoding reached their peak in the $\alpha $ band and then gradually decreased in the $\gamma $ band. Through model analysis on the KUL dataset, the $\alpha $ low-frequency band had the optimal decoding accuracy of 93.38% in the 5-second decision window, which was 9.78 percentage points higher than that of the baseline model. On the DTU dataset, the decoding accuracy of the α band was significantly improved by 5.5 percentage points compared with the baseline model. This study confirmed the key role of band-specific features in AAD decoding, thereby providing a theoretical basis for the development of a new type of band-spatial-temporal coupled brain-computer interface based on feature optimization.



Key wordselectroencephalogram (EEG)      auditory attention detection (AAD)      band analysis      decoding accuracy      cocktail party effect     
Received: 26 April 2025      Published: 19 March 2026
CLC:  TP 312  
Fund:  兰州交通大学-天津大学高校联合创新基金资助项目(LH2023002);天津市自然科学基金资助项目(21JCZXJC00190).
Cite this article:

Chunli WANG,Yuxin GAO,Jinxu LI. Auditory attention decoding analysis based on band-specific spatial-temporal frequency fusion. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 887-895.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.04.020     OR     https://www.zjujournals.com/eng/Y2026/V60/I4/887


基于频段特异性时空频融合的听觉注意力解码分析

针对听觉注意力检测方法忽视脑电信号频段特异性贡献的问题,提出时空频特征融合框架(SSF-ConvLSTM). 该框架系统评估$\delta $(1~4 Hz)、$\theta $(4~8 Hz)、$\alpha $(8~13 Hz)、$\beta $(13~30 Hz)和$\gamma $(30~50 Hz)频段的差异化贡献,从而实现关键频段的定量筛选与动态耦合建模. 通过脑地形图揭示不同频段神经活动的空间权重分布,筛选与目标语音编码密切相关的频段. 构建SSF-ConvLSTM模型,通过卷积层提取频带空间特征,并融合长短时记忆网络(LSTM)对注意力时变动态的建模能力,从而实现跨频段时空动态特征的联合解码. 在公开KUL和DTU数据集上进行算法验证,结果表明:随着频率不断增高,与听觉注意解码相关的额叶和颞叶权重在$ \alpha$频带达到峰值,随后至$\gamma $频带逐渐降低;在KUL数据集上,低频带$\alpha $在5 s决策窗口具有最优解码精确度93.38%,较基线模型提高了9.78个百分点;在DTU数据集上,$\alpha $频带解码精度较基线模型显著提高5.5个百分点. 本研究证实了频段特异性特征对AAD解码的关键作用,为开发基于特征优化的新型频段-时空耦合脑机接口提供了理论依据.


关键词: 脑电图 (EEG),  听觉注意力检测 (AAD),  频带分析,  解码精度,  鸡尾酒会效应 
Fig.1 Framework of SSF-ConvLSTM
数据集N言语刺激方向ts/minT/h
KUL16佛兰德语90°左和90°右4812.8
DTU18丹麦语60°左和60°右5015.0
Tab.1 Dataset details table
层名dindoutn
三维卷积层5×32×32×15×32×32×32320
批归一化层5×32×32×325×32×32×32128
卷积长短期记忆5×32×32×3232×32×3273856
压缩层32×32×32327680
全连接层327685121677728
批归一化层5125122048
全连接层5123216416
归一化指数函数32266
Tab.2 Module detailed configuration parameters table
Fig.2 Brain topographic map distribution on KUL dataset
Fig.3 Brain topographic map distribution on DTU dataset
Fig.4 Decoding accuracy plots of different frequency bands in KUL dataset
Fig.5 Decoding accuracy plots of different frequency bands in DTU dataset
Fig.6 Trend of decoding accuracy in different frequency bands
Fig.7 Result analysis diagram of $\alpha $ band
[1]   HAN C, O’SULLIVAN J, LUO Y, et al Speaker-independent auditory attention decoding without access to clean speech sources[J]. Science Advances, 2019, 5 (5): eaav6134
doi: 10.1126/sciadv.aav6134
[2]   MONESI M J, ACCOU B, MONTOYA-MARTINEZ J, et al. An LSTM based architecture to relate speech stimulus to eeg [C]// ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 941–945.
[3]   CAI S, SU E, XIE L, et al EEG-based auditory attention detection via frequency and channel neural attention[J]. IEEE Transactions on Human-Machine Systems, 2021, 52 (2): 256- 266
[4]   HWANG H J, KIM S, CHOI S, et al EEG-based brain-computer interfaces: a thorough literature survey[J]. International Journal of Human-Computer Interaction, 2013, 29 (12): 814- 826
doi: 10.1080/10447318.2013.780869
[5]   CEOLINI E, HJORTKJÆR J, WONG D D E, et al Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception[J]. NeuroImage, 2020, 223: 117282
doi: 10.1016/j.neuroimage.2020.117282
[6]   DE TAILLEZ T, KOLLMEIER B, MEYER B T Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech[J]. European Journal of Neuroscience, 2020, 51 (5): 1234- 1241
doi: 10.1111/ejn.13790
[7]   THORNTON M, MANDIC D, REICHENBACH T Robust decoding of the speech envelope from EEG recordings through deep neural networks[J]. Journal of Neural Engineering, 2022, 19 (4): 046007
doi: 10.1088/1741-2552/ac7976
[8]   SOMERS B, VERSCHUEREN E, FRANCART T Neural tracking of the speech envelope in cochlear implant users[J]. Journal of Neural Engineering, 2019, 16 (1): 016003
doi: 10.1088/1741-2552/aae6b9
[9]   WANG L, WU E X, CHEN F EEG-based auditory attention decoding using speech-level-based segmented computational models[J]. Journal of Neural Engineering, 2021, 18 (4): 046066
doi: 10.1088/1741-2552/abfeba
[10]   XU Z, BAI Y, ZHAO R, et al Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction[J]. Hearing Research, 2022, 422: 108552
doi: 10.1016/j.heares.2022.108552
[11]   ZHU H, CAI S, JIANG Y, et al. EEG-derived voice signature for attended speaker detection [EB/OL]. (2023−08−29) [2025−12−25]. https://doi.org/10.48550/arXiv.2308.14774.
[12]   王春丽, 李金絮, 高玉鑫, 等 一种基于时空频多维特征的短时窗口脑电听觉注意解码网络[J]. 电子与信息学报, 2025, 47 (3): 814- 824
WANG Chunli, LI Jinxu, GAO Yuxin, et al A short-time window ElectroEncephaloGram auditory attention decoding network based on multi-dimensional characteristics of temporal-spatial-frequency[J]. Journal of Electronics and Information Technology, 2025, 47 (3): 814- 824
doi: 10.11999/JEIT240867
[13]   CAI S, SUN P, SCHULTZ T, et al. Low-latency auditory spatial attention detection based on spectro-spatial features from EEG [C]// 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Mexico: IEEE, 2021: 5812-5815.
[14]   XIE Z, WEI J, LU W, et al. EEG-based fast auditory attention detection in real-life scenarios using time-frequency attention mechanism [C]// 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul: IEEE, 2024: 1741–1745.
[15]   JIANG Z, AN X, LIU S, et al Neural oscillations reflect the individual differences in the temporal perception of audiovisual speech[J]. Cerebral Cortex, 2023, 33 (20): 10575- 10583
doi: 10.1093/cercor/bhad304
[16]   FRIESE U, DAUME J, GÖSCHL F, et al Oscillatory brain activity during multisensory attention reflects activation, disinhibition, and cognitive control[J]. Scientific Reports, 2016, 6: 32775
doi: 10.1038/srep32775
[17]   POPOV T, KASTNER S, JENSEN O FEF-controlled alpha delay activity precedes stimulus-induced gamma-band activity in visual cortex[J]. The Journal of Neuroscience, 2017, 37 (15): 4117- 4127
doi: 10.1523/JNEUROSCI.3015-16.2017
[18]   LIU Y J, YU M, ZHAO G, et al Real-time movie-induced discrete emotion recognition from EEG signals[J]. IEEE Transactions on Affective Computing, 2018, 9 (4): 550- 562
doi: 10.1109/TAFFC.2017.2660485
[19]   KAUSAR T, LU Y, ASGHAR M A, et al Auditory-GAN: deep learning framework for improved auditory spatial attention detection[J]. PeerJ Computer Science, 2024, 10: e2394
doi: 10.7717/peerj-cs.2394
[20]   DAS N, FRANCART T, BERTRAND A Auditory attention detection dataset KUL euven[J]. Zenodo, 2020, 25 (5): 402- 412
[21]   GEIRNAERT S, FRANCART T, BERTRAND A Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns[J]. IEEE Transactions on Bio-Medical Engineering, 2021, 68 (5): 1557- 1568
doi: 10.1109/TBME.2020.3033446
[22]   FUGLSANG S A, DAU T, HJORTKJÆR J Noise-robust cortical tracking of attended speech in real-world acoustic scenes[J]. NeuroImage, 2017, 156: 435- 444
doi: 10.1016/j.neuroimage.2017.04.026
[23]   FUGLSANG S A, WONG D D E, HJORTKJAER J. EEG audio dataset for auditory attention decoding [EB/OL]. [2025−12−25]. https://doi.org/10.5281/zenodo. 1199011.
[24]   CICCARELLI G, NOLAN M, PERRICONE J, et al Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods[J]. Scientific Reports, 2019, 9: 11538
doi: 10.1038/s41598-019-47795-0
[25]   ZHANG Z, ZHANG G, DANG J, et al. EEG-based short-time auditory attention detection using multi-task deep learning [C]// Interspeech 2020. ISCA: 2020: 2517-2521.
[26]   VANDECAPPELLE S, DECKERS L, DAS N, et al EEG-based detection of the locus of auditory attention with convolutional neural networks[J]. eLife, 2021, 10: e56481
doi: 10.7554/eLife.56481
[27]   FU Z, WANG B, WU X, et al. Auditory attention decoding from EEG using convolutional recurrent neural network [C]// 29th European Signal Processing Conference. Dublin: IEEE, 2021: 970–974.
[28]   ZION GOLUMBIC E M, DING N, BICKEL S, et al Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”[J]. Neuron, 2013, 77 (5): 980- 991
doi: 10.1016/j.neuron.2012.12.037
[29]   MIZOKUCHI K, TANAKA T, SATO T G, et al Alpha band modulation caused by selective attention to music enables EEG classification[J]. Cognitive Neurodynamics, 2024, 18 (3): 1005- 1020
doi: 10.1007/s11571-023-09955-x
[30]   SU E, CAI S, LI P, et al. Auditory attention detection with EEG channel attention [C]// 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Mexico: IEEE, 2021: 5804−5807.
[1] Zhentao DONG,Kaimin XU,Qingying WAN,Xiaofei LIU,Hao SHEN,Shuhan LI,Geqi QI. Multimodal emotional feature analysis based on short video resources of traffic incidents[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 661-668.
[2] Yixuan LI,Ying LI,Qian XIAO,Lingyue WANG,Ning YIN,Shuo YANG. EEG microstate functional network analysis of different emotional false memories[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 49-61.
[3] WANG Wei-xing, SUN Shou-qian, LI Chao, TANG Zhi-chuan. Recognition of upper limb motion intention of EEG signal based on convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(7): 1381-1389.
[4] YANG Bang-hua, HE Mei-yan, LIU Li, LU Wen-yu. EEG classification based on batch incremental SVM in
brain computer interfaces
[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(8): 1431-1436.
[5] ZHU Dan-hua, CHEN Da-jing, CHEN Yu-quan, PAN Min. Enhancement of steady-state visual evoked potentials using
parameter-tuned stochastic resonance
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(5): 918-922.