Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2014, Vol. 15 Issue (10): 903-916    DOI: 10.1631/jzus.C1400002
    
Mismatched feature detection with finer granularity for emotional speaker recognition
Li Chen, Ying-chun Yang, Zhao-hui Wu
College of Computer Science & Technology, Zhejiang University, Hangzhou 310027, China
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  The shapes of speakers’ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition—phoneme classes, Gaussian mixture model (GMM) tokenizer, and probabilistic GMM tokenizer—are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus (MASC) show that our feature pruning and feature regulation methods increase the identification rate (IR) by 3.64% and 6.77%, compared with the baseline GMM-UBM (universal background model) algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector.

Key wordsEmotional speaker recognition      Mismatched feature detection      Feature regulation     
Received: 05 January 2014      Published: 09 October 2014
CLC:  TP391.4  
Cite this article:

Li Chen, Ying-chun Yang, Zhao-hui Wu. Mismatched feature detection with finer granularity for emotional speaker recognition. Front. Inform. Technol. Electron. Eng., 2014, 15(10): 903-916.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1400002     OR     http://www.zjujournals.com/xueshu/fitee/Y2014/V15/I10/903


用于情感说话人识别的精细失真特征检测与修正

研究目的:说话人情感变化时其发音器官会发生形变,导致部分语音特征分布较中性条件下发生一定偏移。这些发生偏移的特征使得说话人识别性能大幅下降,称作\"失真特征\",需剔除或修正,以提升情感说话人识别系统性能。
\n创新要点:鉴于不同音素引起的失真特征分布变化存在差异,提出在音素类、高斯符号化和概率高斯符号化三种声学类上的精细失真特征检测模型与修正方法。
\n研究方法:采用流形分析方法,观测失真特征分布,得到结论:偏离中性特征空间越远,区分说话人能力越差。若基于某项特征的说话人区分能力小于某个阈值,即检测为失真特征(图1)。对于音素类和高斯符号化表示的声学类,采用支持向量机建立可靠–失真特征检测模型;对于概率高斯符号化表征的声学类,采用模糊支持向量机建立可靠–失真特征检测模型。为确保修正后的失真特征逼近真实的中性情形又不损失说话人特性,对检测出的失真特征进行修正时,将失真特征空间映射到可靠特征空间的同时,要使得转换后的失真特征空间和其他说话人的可靠特征空间的距离不会随之减少。
\n重要结论:情感导致说话人的部分语音特征分布发生变化成为失真特征,通过三种声学类的精细失真特征检测与修正,能够有效处理失真特征,提升系统识别性能。最高的概率高斯符号化下的失真特征修正算法,使得基准的GMM-UBM算法识别率提升6.77%,i-vector算法识别率提升3.32%。

关键词: 情感说话人识别,  模糊支持向量机,  失真特征检测,  特征修正 
[1] Yuan-ping Nie, Yi Han, Jiu-ming Huang, Bo Jiao, Ai-ping Li. Attention-based encoder-decoder model for answer selection in question answering[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 535-544.
[2] Rong-Feng Zhang , Ting Deng , Gui-Hong Wang , Jing-Lun Shi , Quan-Sheng Guan . A robust object tracking framework based on a reliable point assignment algorithm[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 545-558.
[3] Le-kui Zhou, Si-liang Tang, Jun Xiao, Fei Wu, Yue-ting Zhuang. Disambiguating named entities with deep supervised learning via crowd labels[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(1): 97-106.
[4] Yue-ting Zhuang, Fei Wu, Chun Chen, Yun-he Pan. Challenges and opportunities: from big data to knowledge in AI 2.0[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(1): 3-14.
[5] M. F. Kazemi, M. A. Pourmina, A. H. Mazinan. Level-direction decomposition analysis with a focus on image watermarking framework[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(11): 1199-1217.
[6] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. Two-level hierarchical feature learning for image classification[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.
[7] Jia-yin Song, Wen-long Song, Jian-ping Huang, Liang-kuan Zhu. Segmentation and focus-point location based on boundary analysis in forest canopy hemispherical photography[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(8): 741-749.
[8] Gao-li Sang, Hu Chen, Ge Huang, Qi-jun Zhao. Unseen head pose prediction using dense multivariate label distribution[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(6): 516-526.
[9] Xi-chuan Zhou, Fang Tang, Qin Li, Sheng-dong Hu, Guo-jun Li, Yun-jian Jia, Xin-ke Li, Yu-jie Feng. Global influenza surveillance with Laplacian multidimensional scaling[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(5): 413-421.
[10] Chu-hua Huang, Dong-ming Lu, Chang-yu Diao. A multiscale-contour-based interpolation framework for generating a time-varying quasi-dense point cloud sequence[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(5): 422-434.
[11] Xiao-hu Ma, Meng Yang, Zhao Zhang. Local uncorrelated local discriminant embedding for face recognition[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(3): 212-223.
[12] Fu-xiang Lu, Jun Huang. Beyond bag of latent topics: spatial pyramid matching for scene category recognition[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(10): 817-828.
[13] Yu Liu, Bo Zhu. Deformable image registration with geometric changes[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(10): 829-837.
[14] Zheng-wei Huang, Wen-tao Xue, Qi-rong Mao. Speech emotion recognition with unsupervised feature learning[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 358-366.
[15] Xun Liu, Yin Zhang, San-yuan Zhang, Ying Wang, Zhong-yan Liang, Xiu-zi Ye. Detection of engineering vehicles in high-resolution monitoring images[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 346-357.