Auditory attention decoding analysis based on band-specific spatial-temporal frequency fusion

doi:10.3785/j.issn.1008-973X.2026.04.020

Journal of ZheJiang University (Engineering Science)

2026, Vol. 60

Issue (4): 887-895 DOI: 10.3785/j.issn.1008-973X.2026.04.020

Auditory attention decoding analysis based on band-specific spatial-temporal frequency fusion

Chunli WANG(

),Yuxin GAO,Jinxu LI

School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730000, China

Download:

HTML

PDF(2409KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A spatial-temporal frequency feature fusion framework (SSF-ConvLSTM) was proposed, in order to address the issue that auditory attention detection methods ignore the band-specific contribution of EEG signals. This framework systematically evaluated the differentiated contributions of the $\delta $(1~4 Hz), $\theta $(4~8 Hz), $\alpha $(8~13 Hz), $\beta $(13~30 Hz), and $\gamma $(30~50 Hz) frequency bands, thereby achieving quantitative screening and dynamic coupling modeling of key frequency bands. Firstly, the spatial weight distribution of neural activities in different frequency bands was revealed through brain topographic maps, thereby screening those closely related to target speech encoding. Secondly, the SSF-ConvLSTM model was constructed. The spatial features of the frequency bands were extracted through convolutional layers, and the modeling ability of the Long Short-Term Memory (LSTM) network for the time-varying dynamics of attention was integrated, thereby enabling the joint decoding of spatial-temporal dynamic features across frequency bands. The algorithm was verified on the public KUL and DTU datasets. The results showed that as the frequency continuously increased, the weights of the frontal and temporal lobes related to auditory attention decoding reached their peak in the $\alpha $ band and then gradually decreased in the $\gamma $ band. Through model analysis on the KUL dataset, the $\alpha $ low-frequency band had the optimal decoding accuracy of 93.38% in the 5-second decision window, which was 9.78 percentage points higher than that of the baseline model. On the DTU dataset, the decoding accuracy of the α band was significantly improved by 5.5 percentage points compared with the baseline model. This study confirmed the key role of band-specific features in AAD decoding, thereby providing a theoretical basis for the development of a new type of band-spatial-temporal coupled brain-computer interface based on feature optimization.

Key words： electroencephalogram (EEG) auditory attention detection (AAD) band analysis decoding accuracy cocktail party effect

Received: 26 April 2025 Published: 19 March 2026

CLC:

TP 312

Fund: 兰州交通大学-天津大学高校联合创新基金资助项目(LH2023002)；天津市自然科学基金资助项目(21JCZXJC00190).

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Chunli WANG
	Yuxin GAO
	Jinxu LI

Cite this article:

Chunli WANG,Yuxin GAO,Jinxu LI. Auditory attention decoding analysis based on band-specific spatial-temporal frequency fusion. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 887-895.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.04.020 OR https://www.zjujournals.com/eng/Y2026/V60/I4/887

基于频段特异性时空频融合的听觉注意力解码分析

针对听觉注意力检测方法忽视脑电信号频段特异性贡献的问题，提出时空频特征融合框架(SSF-ConvLSTM). 该框架系统评估$\delta $(1~4 Hz)、$\theta $(4~8 Hz)、$\alpha $(8~13 Hz)、$\beta $(13~30 Hz)和$\gamma $(30~50 Hz)频段的差异化贡献，从而实现关键频段的定量筛选与动态耦合建模. 通过脑地形图揭示不同频段神经活动的空间权重分布，筛选与目标语音编码密切相关的频段. 构建SSF-ConvLSTM模型，通过卷积层提取频带空间特征，并融合长短时记忆网络(LSTM)对注意力时变动态的建模能力，从而实现跨频段时空动态特征的联合解码. 在公开KUL和DTU数据集上进行算法验证，结果表明：随着频率不断增高，与听觉注意解码相关的额叶和颞叶权重在$ \alpha$频带达到峰值，随后至$\gamma $频带逐渐降低；在KUL数据集上，低频带$\alpha $在5 s决策窗口具有最优解码精确度93.38%，较基线模型提高了9.78个百分点；在DTU数据集上，$\alpha $频带解码精度较基线模型显著提高5.5个百分点. 本研究证实了频段特异性特征对AAD解码的关键作用，为开发基于特征优化的新型频段-时空耦合脑机接口提供了理论依据.

关键词： 脑电图 (EEG), 听觉注意力检测 (AAD), 频带分析, 解码精度, 鸡尾酒会效应

Fig.1 Framework of SSF-ConvLSTM

Tab.1 Dataset details table

Tab.2 Module detailed configuration parameters table

Fig.2 Brain topographic map distribution on KUL dataset

Fig.3 Brain topographic map distribution on DTU dataset

Fig.4 Decoding accuracy plots of different frequency bands in KUL dataset

Fig.5 Decoding accuracy plots of different frequency bands in DTU dataset

Fig.6 Trend of decoding accuracy in different frequency bands

Fig.7 Result analysis diagram of

$\alpha $

band


[1]	HAN C, O’SULLIVAN J, LUO Y, et al Speaker-independent auditory attention decoding without access to clean speech sources[J]. Science Advances, 2019, 5 (5): eaav6134 doi: 10.1126/sciadv.aav6134

[2]	MONESI M J, ACCOU B, MONTOYA-MARTINEZ J, et al. An LSTM based architecture to relate speech stimulus to eeg [C]// ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 941–945.

[3]	CAI S, SU E, XIE L, et al EEG-based auditory attention detection via frequency and channel neural attention[J]. IEEE Transactions on Human-Machine Systems, 2021, 52 (2): 256- 266

[4]	HWANG H J, KIM S, CHOI S, et al EEG-based brain-computer interfaces: a thorough literature survey[J]. International Journal of Human-Computer Interaction, 2013, 29 (12): 814- 826 doi: 10.1080/10447318.2013.780869

[5]	CEOLINI E, HJORTKJÆR J, WONG D D E, et al Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception[J]. NeuroImage, 2020, 223: 117282 doi: 10.1016/j.neuroimage.2020.117282

[6]	DE TAILLEZ T, KOLLMEIER B, MEYER B T Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech[J]. European Journal of Neuroscience, 2020, 51 (5): 1234- 1241 doi: 10.1111/ejn.13790

[7]	THORNTON M, MANDIC D, REICHENBACH T Robust decoding of the speech envelope from EEG recordings through deep neural networks[J]. Journal of Neural Engineering, 2022, 19 (4): 046007 doi: 10.1088/1741-2552/ac7976

[8]	SOMERS B, VERSCHUEREN E, FRANCART T Neural tracking of the speech envelope in cochlear implant users[J]. Journal of Neural Engineering, 2019, 16 (1): 016003 doi: 10.1088/1741-2552/aae6b9

[9]	WANG L, WU E X, CHEN F EEG-based auditory attention decoding using speech-level-based segmented computational models[J]. Journal of Neural Engineering, 2021, 18 (4): 046066 doi: 10.1088/1741-2552/abfeba

[10]	XU Z, BAI Y, ZHAO R, et al Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction[J]. Hearing Research, 2022, 422: 108552 doi: 10.1016/j.heares.2022.108552

[11]	ZHU H, CAI S, JIANG Y, et al. EEG-derived voice signature for attended speaker detection [EB/OL]. (2023−08−29) [2025−12−25]. https://doi.org/10.48550/arXiv.2308.14774.

[12]	王春丽, 李金絮, 高玉鑫, 等一种基于时空频多维特征的短时窗口脑电听觉注意解码网络[J]. 电子与信息学报, 2025, 47 (3): 814- 824 WANG Chunli, LI Jinxu, GAO Yuxin, et al A short-time window ElectroEncephaloGram auditory attention decoding network based on multi-dimensional characteristics of temporal-spatial-frequency[J]. Journal of Electronics and Information Technology, 2025, 47 (3): 814- 824 doi: 10.11999/JEIT240867

[13]	CAI S, SUN P, SCHULTZ T, et al. Low-latency auditory spatial attention detection based on spectro-spatial features from EEG [C]// 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Mexico: IEEE, 2021: 5812-5815.

[14]	XIE Z, WEI J, LU W, et al. EEG-based fast auditory attention detection in real-life scenarios using time-frequency attention mechanism [C]// 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul: IEEE, 2024: 1741–1745.

[15]	JIANG Z, AN X, LIU S, et al Neural oscillations reflect the individual differences in the temporal perception of audiovisual speech[J]. Cerebral Cortex, 2023, 33 (20): 10575- 10583 doi: 10.1093/cercor/bhad304

[16]	FRIESE U, DAUME J, GÖSCHL F, et al Oscillatory brain activity during multisensory attention reflects activation, disinhibition, and cognitive control[J]. Scientific Reports, 2016, 6: 32775 doi: 10.1038/srep32775

[17]	POPOV T, KASTNER S, JENSEN O FEF-controlled alpha delay activity precedes stimulus-induced gamma-band activity in visual cortex[J]. The Journal of Neuroscience, 2017, 37 (15): 4117- 4127 doi: 10.1523/JNEUROSCI.3015-16.2017

[18]	LIU Y J, YU M, ZHAO G, et al Real-time movie-induced discrete emotion recognition from EEG signals[J]. IEEE Transactions on Affective Computing, 2018, 9 (4): 550- 562 doi: 10.1109/TAFFC.2017.2660485

[19]	KAUSAR T, LU Y, ASGHAR M A, et al Auditory-GAN: deep learning framework for improved auditory spatial attention detection[J]. PeerJ Computer Science, 2024, 10: e2394 doi: 10.7717/peerj-cs.2394

[20]	DAS N, FRANCART T, BERTRAND A Auditory attention detection dataset KUL euven[J]. Zenodo, 2020, 25 (5): 402- 412

[21]	GEIRNAERT S, FRANCART T, BERTRAND A Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns[J]. IEEE Transactions on Bio-Medical Engineering, 2021, 68 (5): 1557- 1568 doi: 10.1109/TBME.2020.3033446

[22]	FUGLSANG S A, DAU T, HJORTKJÆR J Noise-robust cortical tracking of attended speech in real-world acoustic scenes[J]. NeuroImage, 2017, 156: 435- 444 doi: 10.1016/j.neuroimage.2017.04.026

[23]	FUGLSANG S A, WONG D D E, HJORTKJAER J. EEG audio dataset for auditory attention decoding [EB/OL]. [2025−12−25]. https://doi.org/10.5281/zenodo. 1199011.

[24]	CICCARELLI G, NOLAN M, PERRICONE J, et al Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods[J]. Scientific Reports, 2019, 9: 11538 doi: 10.1038/s41598-019-47795-0

[25]	ZHANG Z, ZHANG G, DANG J, et al. EEG-based short-time auditory attention detection using multi-task deep learning [C]// Interspeech 2020. ISCA: 2020: 2517-2521.

[26]	VANDECAPPELLE S, DECKERS L, DAS N, et al EEG-based detection of the locus of auditory attention with convolutional neural networks[J]. eLife, 2021, 10: e56481 doi: 10.7554/eLife.56481

[27]	FU Z, WANG B, WU X, et al. Auditory attention decoding from EEG using convolutional recurrent neural network [C]// 29th European Signal Processing Conference. Dublin: IEEE, 2021: 970–974.

[28]	ZION GOLUMBIC E M, DING N, BICKEL S, et al Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”[J]. Neuron, 2013, 77 (5): 980- 991 doi: 10.1016/j.neuron.2012.12.037

[29]	MIZOKUCHI K, TANAKA T, SATO T G, et al Alpha band modulation caused by selective attention to music enables EEG classification[J]. Cognitive Neurodynamics, 2024, 18 (3): 1005- 1020 doi: 10.1007/s11571-023-09955-x

[30]	SU E, CAI S, LI P, et al. Auditory attention detection with EEG channel attention [C]// 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Mexico: IEEE, 2021: 5804−5807.

[1]	Zhentao DONG,Kaimin XU,Qingying WAN,Xiaofei LIU,Hao SHEN,Shuhan LI,Geqi QI. Multimodal emotional feature analysis based on short video resources of traffic incidents[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 661-668.

[2]	Yixuan LI,Ying LI,Qian XIAO,Lingyue WANG,Ning YIN,Shuo YANG. EEG microstate functional network analysis of different emotional false memories[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 49-61.

[3]	WANG Wei-xing, SUN Shou-qian, LI Chao, TANG Zhi-chuan. Recognition of upper limb motion intention of EEG signal based on convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(7): 1381-1389.

[4]	YANG Bang-hua, HE Mei-yan, LIU Li, LU Wen-yu. EEG classification based on batch incremental SVM in brain computer interfaces[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(8): 1431-1436.

[5]	ZHU Dan-hua, CHEN Da-jing, CHEN Yu-quan, PAN Min. Enhancement of steady-state visual evoked potentials using parameter-tuned stochastic resonance[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(5): 918-922.

Viewed

Full text

Abstract

Cited

Shared

Discussed