|
|
|
| Auditory attention decoding analysis based on band-specific spatial-temporal frequency fusion |
Chunli WANG( ),Yuxin GAO,Jinxu LI |
| School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730000, China |
|
|
|
Abstract A spatial-temporal frequency feature fusion framework (SSF-ConvLSTM) was proposed, in order to address the issue that auditory attention detection methods ignore the band-specific contribution of EEG signals. This framework systematically evaluated the differentiated contributions of the $\delta $(1~4 Hz), $\theta $(4~8 Hz), $\alpha $(8~13 Hz), $\beta $(13~30 Hz), and $\gamma $(30~50 Hz) frequency bands, thereby achieving quantitative screening and dynamic coupling modeling of key frequency bands. Firstly, the spatial weight distribution of neural activities in different frequency bands was revealed through brain topographic maps, thereby screening those closely related to target speech encoding. Secondly, the SSF-ConvLSTM model was constructed. The spatial features of the frequency bands were extracted through convolutional layers, and the modeling ability of the Long Short-Term Memory (LSTM) network for the time-varying dynamics of attention was integrated, thereby enabling the joint decoding of spatial-temporal dynamic features across frequency bands. The algorithm was verified on the public KUL and DTU datasets. The results showed that as the frequency continuously increased, the weights of the frontal and temporal lobes related to auditory attention decoding reached their peak in the $\alpha $ band and then gradually decreased in the $\gamma $ band. Through model analysis on the KUL dataset, the $\alpha $ low-frequency band had the optimal decoding accuracy of 93.38% in the 5-second decision window, which was 9.78 percentage points higher than that of the baseline model. On the DTU dataset, the decoding accuracy of the α band was significantly improved by 5.5 percentage points compared with the baseline model. This study confirmed the key role of band-specific features in AAD decoding, thereby providing a theoretical basis for the development of a new type of band-spatial-temporal coupled brain-computer interface based on feature optimization.
|
|
Received: 26 April 2025
Published: 19 March 2026
|
|
|
| Fund: 兰州交通大学-天津大学高校联合创新基金资助项目(LH2023002);天津市自然科学基金资助项目(21JCZXJC00190). |
基于频段特异性时空频融合的听觉注意力解码分析
针对听觉注意力检测方法忽视脑电信号频段特异性贡献的问题,提出时空频特征融合框架(SSF-ConvLSTM). 该框架系统评估$\delta $(1~4 Hz)、$\theta $(4~8 Hz)、$\alpha $(8~13 Hz)、$\beta $(13~30 Hz)和$\gamma $(30~50 Hz)频段的差异化贡献,从而实现关键频段的定量筛选与动态耦合建模. 通过脑地形图揭示不同频段神经活动的空间权重分布,筛选与目标语音编码密切相关的频段. 构建SSF-ConvLSTM模型,通过卷积层提取频带空间特征,并融合长短时记忆网络(LSTM)对注意力时变动态的建模能力,从而实现跨频段时空动态特征的联合解码. 在公开KUL和DTU数据集上进行算法验证,结果表明:随着频率不断增高,与听觉注意解码相关的额叶和颞叶权重在$ \alpha$频带达到峰值,随后至$\gamma $频带逐渐降低;在KUL数据集上,低频带$\alpha $在5 s决策窗口具有最优解码精确度93.38%,较基线模型提高了9.78个百分点;在DTU数据集上,$\alpha $频带解码精度较基线模型显著提高5.5个百分点. 本研究证实了频段特异性特征对AAD解码的关键作用,为开发基于特征优化的新型频段-时空耦合脑机接口提供了理论依据.
关键词:
脑电图 (EEG),
听觉注意力检测 (AAD),
频带分析,
解码精度,
鸡尾酒会效应
|
|
| [1] |
HAN C, O’SULLIVAN J, LUO Y, et al Speaker-independent auditory attention decoding without access to clean speech sources[J]. Science Advances, 2019, 5 (5): eaav6134
doi: 10.1126/sciadv.aav6134
|
|
|
| [2] |
MONESI M J, ACCOU B, MONTOYA-MARTINEZ J, et al. An LSTM based architecture to relate speech stimulus to eeg [C]// ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 941–945.
|
|
|
| [3] |
CAI S, SU E, XIE L, et al EEG-based auditory attention detection via frequency and channel neural attention[J]. IEEE Transactions on Human-Machine Systems, 2021, 52 (2): 256- 266
|
|
|
| [4] |
HWANG H J, KIM S, CHOI S, et al EEG-based brain-computer interfaces: a thorough literature survey[J]. International Journal of Human-Computer Interaction, 2013, 29 (12): 814- 826
doi: 10.1080/10447318.2013.780869
|
|
|
| [5] |
CEOLINI E, HJORTKJÆR J, WONG D D E, et al Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception[J]. NeuroImage, 2020, 223: 117282
doi: 10.1016/j.neuroimage.2020.117282
|
|
|
| [6] |
DE TAILLEZ T, KOLLMEIER B, MEYER B T Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech[J]. European Journal of Neuroscience, 2020, 51 (5): 1234- 1241
doi: 10.1111/ejn.13790
|
|
|
| [7] |
THORNTON M, MANDIC D, REICHENBACH T Robust decoding of the speech envelope from EEG recordings through deep neural networks[J]. Journal of Neural Engineering, 2022, 19 (4): 046007
doi: 10.1088/1741-2552/ac7976
|
|
|
| [8] |
SOMERS B, VERSCHUEREN E, FRANCART T Neural tracking of the speech envelope in cochlear implant users[J]. Journal of Neural Engineering, 2019, 16 (1): 016003
doi: 10.1088/1741-2552/aae6b9
|
|
|
| [9] |
WANG L, WU E X, CHEN F EEG-based auditory attention decoding using speech-level-based segmented computational models[J]. Journal of Neural Engineering, 2021, 18 (4): 046066
doi: 10.1088/1741-2552/abfeba
|
|
|
| [10] |
XU Z, BAI Y, ZHAO R, et al Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction[J]. Hearing Research, 2022, 422: 108552
doi: 10.1016/j.heares.2022.108552
|
|
|
| [11] |
ZHU H, CAI S, JIANG Y, et al. EEG-derived voice signature for attended speaker detection [EB/OL]. (2023−08−29) [2025−12−25]. https://doi.org/10.48550/arXiv.2308.14774.
|
|
|
| [12] |
王春丽, 李金絮, 高玉鑫, 等 一种基于时空频多维特征的短时窗口脑电听觉注意解码网络[J]. 电子与信息学报, 2025, 47 (3): 814- 824 WANG Chunli, LI Jinxu, GAO Yuxin, et al A short-time window ElectroEncephaloGram auditory attention decoding network based on multi-dimensional characteristics of temporal-spatial-frequency[J]. Journal of Electronics and Information Technology, 2025, 47 (3): 814- 824
doi: 10.11999/JEIT240867
|
|
|
| [13] |
CAI S, SUN P, SCHULTZ T, et al. Low-latency auditory spatial attention detection based on spectro-spatial features from EEG [C]// 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Mexico: IEEE, 2021: 5812-5815.
|
|
|
| [14] |
XIE Z, WEI J, LU W, et al. EEG-based fast auditory attention detection in real-life scenarios using time-frequency attention mechanism [C]// 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul: IEEE, 2024: 1741–1745.
|
|
|
| [15] |
JIANG Z, AN X, LIU S, et al Neural oscillations reflect the individual differences in the temporal perception of audiovisual speech[J]. Cerebral Cortex, 2023, 33 (20): 10575- 10583
doi: 10.1093/cercor/bhad304
|
|
|
| [16] |
FRIESE U, DAUME J, GÖSCHL F, et al Oscillatory brain activity during multisensory attention reflects activation, disinhibition, and cognitive control[J]. Scientific Reports, 2016, 6: 32775
doi: 10.1038/srep32775
|
|
|
| [17] |
POPOV T, KASTNER S, JENSEN O FEF-controlled alpha delay activity precedes stimulus-induced gamma-band activity in visual cortex[J]. The Journal of Neuroscience, 2017, 37 (15): 4117- 4127
doi: 10.1523/JNEUROSCI.3015-16.2017
|
|
|
| [18] |
LIU Y J, YU M, ZHAO G, et al Real-time movie-induced discrete emotion recognition from EEG signals[J]. IEEE Transactions on Affective Computing, 2018, 9 (4): 550- 562
doi: 10.1109/TAFFC.2017.2660485
|
|
|
| [19] |
KAUSAR T, LU Y, ASGHAR M A, et al Auditory-GAN: deep learning framework for improved auditory spatial attention detection[J]. PeerJ Computer Science, 2024, 10: e2394
doi: 10.7717/peerj-cs.2394
|
|
|
| [20] |
DAS N, FRANCART T, BERTRAND A Auditory attention detection dataset KUL euven[J]. Zenodo, 2020, 25 (5): 402- 412
|
|
|
| [21] |
GEIRNAERT S, FRANCART T, BERTRAND A Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns[J]. IEEE Transactions on Bio-Medical Engineering, 2021, 68 (5): 1557- 1568
doi: 10.1109/TBME.2020.3033446
|
|
|
| [22] |
FUGLSANG S A, DAU T, HJORTKJÆR J Noise-robust cortical tracking of attended speech in real-world acoustic scenes[J]. NeuroImage, 2017, 156: 435- 444
doi: 10.1016/j.neuroimage.2017.04.026
|
|
|
| [23] |
FUGLSANG S A, WONG D D E, HJORTKJAER J. EEG audio dataset for auditory attention decoding [EB/OL]. [2025−12−25]. https://doi.org/10.5281/zenodo. 1199011.
|
|
|
| [24] |
CICCARELLI G, NOLAN M, PERRICONE J, et al Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods[J]. Scientific Reports, 2019, 9: 11538
doi: 10.1038/s41598-019-47795-0
|
|
|
| [25] |
ZHANG Z, ZHANG G, DANG J, et al. EEG-based short-time auditory attention detection using multi-task deep learning [C]// Interspeech 2020. ISCA: 2020: 2517-2521.
|
|
|
| [26] |
VANDECAPPELLE S, DECKERS L, DAS N, et al EEG-based detection of the locus of auditory attention with convolutional neural networks[J]. eLife, 2021, 10: e56481
doi: 10.7554/eLife.56481
|
|
|
| [27] |
FU Z, WANG B, WU X, et al. Auditory attention decoding from EEG using convolutional recurrent neural network [C]// 29th European Signal Processing Conference. Dublin: IEEE, 2021: 970–974.
|
|
|
| [28] |
ZION GOLUMBIC E M, DING N, BICKEL S, et al Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”[J]. Neuron, 2013, 77 (5): 980- 991
doi: 10.1016/j.neuron.2012.12.037
|
|
|
| [29] |
MIZOKUCHI K, TANAKA T, SATO T G, et al Alpha band modulation caused by selective attention to music enables EEG classification[J]. Cognitive Neurodynamics, 2024, 18 (3): 1005- 1020
doi: 10.1007/s11571-023-09955-x
|
|
|
| [30] |
SU E, CAI S, LI P, et al. Auditory attention detection with EEG channel attention [C]// 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Mexico: IEEE, 2021: 5804−5807.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|