Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2014, Vol. 15 Issue (10): 903-916    DOI: 10.1631/jzus.C1400002
    
用于情感说话人识别的精细失真特征检测与修正
Li Chen, Ying-chun Yang, Zhao-hui Wu
College of Computer Science & Technology, Zhejiang University, Hangzhou 310027, China
Mismatched feature detection with finer granularity for emotional speaker recognition
Li Chen, Ying-chun Yang, Zhao-hui Wu
College of Computer Science & Technology, Zhejiang University, Hangzhou 310027, China
 全文: PDF 
摘要: 研究目的:说话人情感变化时其发音器官会发生形变,导致部分语音特征分布较中性条件下发生一定偏移。这些发生偏移的特征使得说话人识别性能大幅下降,称作\"失真特征\",需剔除或修正,以提升情感说话人识别系统性能。
\n创新要点:鉴于不同音素引起的失真特征分布变化存在差异,提出在音素类、高斯符号化和概率高斯符号化三种声学类上的精细失真特征检测模型与修正方法。
\n研究方法:采用流形分析方法,观测失真特征分布,得到结论:偏离中性特征空间越远,区分说话人能力越差。若基于某项特征的说话人区分能力小于某个阈值,即检测为失真特征(图1)。对于音素类和高斯符号化表示的声学类,采用支持向量机建立可靠–失真特征检测模型;对于概率高斯符号化表征的声学类,采用模糊支持向量机建立可靠–失真特征检测模型。为确保修正后的失真特征逼近真实的中性情形又不损失说话人特性,对检测出的失真特征进行修正时,将失真特征空间映射到可靠特征空间的同时,要使得转换后的失真特征空间和其他说话人的可靠特征空间的距离不会随之减少。
\n重要结论:情感导致说话人的部分语音特征分布发生变化成为失真特征,通过三种声学类的精细失真特征检测与修正,能够有效处理失真特征,提升系统识别性能。最高的概率高斯符号化下的失真特征修正算法,使得基准的GMM-UBM算法识别率提升6.77%,i-vector算法识别率提升3.32%。
关键词: 情感说话人识别模糊支持向量机失真特征检测特征修正    
Abstract: The shapes of speakers’ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition—phoneme classes, Gaussian mixture model (GMM) tokenizer, and probabilistic GMM tokenizer—are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus (MASC) show that our feature pruning and feature regulation methods increase the identification rate (IR) by 3.64% and 6.77%, compared with the baseline GMM-UBM (universal background model) algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector.
Key words: Emotional speaker recognition    Mismatched feature detection    Feature regulation
收稿日期: 2014-01-05 出版日期: 2014-10-09
CLC:  TP391.4  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Li Chen
Ying-chun Yang
Zhao-hui Wu

引用本文:

Li Chen, Ying-chun Yang, Zhao-hui Wu. Mismatched feature detection with finer granularity for emotional speaker recognition. Front. Inform. Technol. Electron. Eng., 2014, 15(10): 903-916.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1400002        http://www.zjujournals.com/xueshu/fitee/CN/Y2014/V15/I10/903

[1] Yuan-ping Nie, Yi Han, Jiu-ming Huang, Bo Jiao, Ai-ping Li. 基于注意机制编码解码模型的答案选择方法[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(4): 535-544.
[2] Rong-Feng Zhang , Ting Deng , Gui-Hong Wang , Jing-Lun Shi , Quan-Sheng Guan . 基于可靠特征点分配算法的鲁棒性跟踪框架[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(4): 545-558.
[3] Le-kui Zhou, Si-liang Tang, Jun Xiao, Fei Wu, Yue-ting Zhuang. 基于众包标签数据深度学习的命名实体消歧算法[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(1): 97-106.
[4] Yue-ting Zhuang, Fei Wu, Chun Chen, Yun-he Pan. 挑战与希望:AI2.0时代从大数据到知识[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(1): 3-14.
[5] M. F. Kazemi, M. A. Pourmina, A. H. Mazinan. 图像水印框架的层级-方向分解分析[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(11): 1199-1217.
[6] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. 基于两级层次特征学习的图像分类方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.
[7] Jia-yin Song, Wen-long Song, Jian-ping Huang, Liang-kuan Zhu. 基于边界分析的森林冠层半球图像中心点定位与分割[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(8): 741-749.
[8] Gao-li Sang, Hu Chen, Ge Huang, Qi-jun Zhao. 基于稠密多变量标签的“连续”头部姿态估计方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(6): 516-526.
[9] Xi-chuan Zhou, Fang Tang, Qin Li, Sheng-dong Hu, Guo-jun Li, Yun-jian Jia, Xin-ke Li, Yu-jie Feng. 基于多维尺度拉普拉斯分析方法的全球流感疫情监测[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(5): 413-421.
[10] Chu-hua Huang, Dong-ming Lu, Chang-yu Diao. 基于多尺度轮廓插值生成准密集时变点云模型序列[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(5): 422-434.
[11] Xiao-hu Ma, Meng Yang, Zhao Zhang. 局部不相关的局部判别嵌入人脸识别算法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(3): 212-223.
[12] Fu-xiang Lu, Jun Huang. 超越隐主题包模型:针对场景类别识别的空间金字塔匹配[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(10): 817-828.
[13] Yu Liu, Bo Zhu. 带有几何形变的变形图像配准[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(10): 829-837.
[14] Zheng-wei Huang, Wen-tao Xue, Qi-rong Mao. 基于无监督特征学习的语音情感识别方法[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 358-366.
[15] Xun Liu, Yin Zhang, San-yuan Zhang, Ying Wang, Zhong-yan Liang, Xiu-zi Ye. 基于高清监控图像的工程车辆检测算法[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 346-357.