Emotional speaker recognition based on similar neighbor phenomenon

doi:10.3785/j.issn.1008-973X.2012.10.009

2012, Vol. 46

Issue (10): 1790-1795 DOI: 10.3785/j.issn.1008-973X.2012.10.009

Emotional speaker recognition based on similar neighbor phenomenon

CHEN Li, YANG Ying-chun

College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

Download:

PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

Based on the research on phonetics, the assumption that similar-sounding speakers in neutral condition also sound similar when they change their emotions was proposed, known as Similar Neighbor Phenomenon. Additionally, the qualitative and quantitative analysis was conducted to prove the assumption. The “neighbors” of neutral and emotional model of the similar speaker are almost the same under the identical phonetic event. The emotional model synthesis method was proposed in order to overcome the problem that the distribution of acoustic feature under emotional states was different from that of the neutral speaker model. The method can learn the neutral-emotion transformation rules from the development corpus, and apply them into the evaluation corpus to construct the emotional speaker model from his/her neutral one. From the view of Similar Neighbor Phenomenon, neighbors under neutral were selected by the KL distance. The emotional models were constructed by the neighbors-based transformation method and shift-based transformation method. The experiments carried on MASC showed an identification rate (IR) increase of 2.81% over the GMM-UBM algorithm and 1.3% over the emotional attribute projection (EAP) algorithm.

Published: 01 October 2012

CLC:

TP 271

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

CHEN Li, YANG Ying-chun. Emotional speaker recognition based on similar neighbor phenomenon. J4, 2012, 46(10): 1790-1795.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2012.10.009 OR http://www.zjujournals.com/eng/Y2012/V46/I10/1790

基于邻居相似现象的情感说话人识别

根据语音学的研究,提出中性时发音相似的说话人,在情感状态下的发音人相似的假设——邻居相似现象,并通过定量和定性的分析验证了该假设,即在音素内容相同的情况下,同一说话人的中性模型和情感模型对应高斯分量的“邻居”基本类似.为了解决说话人情感变化时语音短时特征的分布与中性语音模型存在差异的问题,提出说话人情感模型合成的方法——将开发库中学习到的中性情感变化规律移植到评测库中,根据说话人的中性模型合成出情感模型.从邻居相似现象的特性出发,根据KL距离选取该说话人中性下若干相似的邻居,根据基于邻居的方法和基于邻居变换的方法,合成出该说话人的情感模型.MASC库上的实验结果表明,该方法的识别准确率比传统的GMM-UBM算法提高了2.81%,与情感属性映射（EAP）方法相比识别率提高了1.3%.

［1］ GHIURCAU M V, RUSU C, ASTOLA J. A study of the effect of emotional state upon textindependent speaker identification ［C］∥ International Conference on Acoustics, Speech and Signal Processing. Prague: IEEE, 2011: 4944-4947.
［2］ BAO H, XU M, ZHENG T F. Emotion attribute projection for speaker recognition on emotional speech ［C］∥ 8th Annual Conference of the International Speech Communication Association. Antwerp: IEEE, 2007: 758-761.
［3］ HUANG T, YANG Y. Applying pitchdependent difference detection and modification to emotional speaker recognition ［C］ ∥ 9th Annual Conference of the International Speech Communication Association. Brisbane: IEEE, 2008: 2751-2754.
［4］ HUANG T, YANG Y. Learning virtual HD model for bimodel emotional speaker recognition ［C］∥ International Conference on Pattern Recognition. Istanbul: IEEE, 2010: 1614-1617.
［5］单振宇,杨莹春.基于多项式拟合的中性情感模型转换算法［J］.计算机工程与应用,2006,44(21): 206-209.
SHAN Zhenyu, YANG Yingchun. Neutralemotion model transformation algorithm based on polynomial function fitting ［J］. Computer Engineering and Applications, 2006, 44(21): 206-209.
［6］ SHAN Z, YANG Y. Naturalemotion GMM transformation algorithm for emotional speaker recognition ［C］∥ 8th Annual Conference of the International Speech Communication Association. Antwerp: IEEE, 2007: 782-785.
［7］ SHAN Z, YANG Y. Learning polynomial function based neutralemotion GMM transformation for emotional speaker recognition ［C］∥ International Conference on Pattern Recognition. Tampa: IEEE, 2008: 8-11.
［8］胡平,曹伟国,李华.一类等距不变量及其在三维表情人脸识别中的应用［J］.计算机辅助设计与图形学学报,2010(12): 2089-2094.
HU Ping, CAO Weiguo, LI Hua. A novel isometric invariant and its applications in 3D face recognition ［J］. Journal of ComputerAided Design and Computer Graphics, 2010(12): 2089-2094.
［9］李爱军,邵鹏飞,党建武.情感表达的跨文化多模态感知研究［J］.清华大学学报:自然科学版,2009(增1): 1-8.
LI Aijun, SHAO Pengfei, DANG Jianwu. Crosscultural and multimodal investigation of emotion expression ［J］. Journal of Tsinghua University: Science and Technology, 2009(suppl.1): 1-8.
［10］ REYNOLDS D A, ROSE R C. Robust textindependent speaker identification using Gaussian mixture speaker models ［J］. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72-83.
［11］ REYNOLDS D A, QUATIERI T F, DUNN Q B. Speaker verification using adapted Gaussian mixture models ［J］. Digital Signal Processing, 2000, 10(1/2/3): 19-41.
［12］ HERSHEY J R, OLSEN P A. Approximating the Kullback Leibler divergence between Gaussian mixture models ［C］∥ International Conference on Acoustics, Speech, and Signal Processing. Honolulu: IEEE, 2007: 317-320.
［13］ HORTON P, NAKAI K. Better prediction of protein cellular localization sites with the k nearest neighbors classifier ［C］∥ American Association for Artificial Intelligence. Providence: IEEE, 1997: 147-152.
［14］ WU T, YANG Y, WU Z, et al. MASC: a speech corpus in mandarin for emotion analysis and affective speaker recognition ［C］∥ ODYSSEY 2006, the Speaker and Language Recognition Workshop. Brno: IEEE, 2006: 1-5.
［15］ VERGIN R, O’SHAUGHNESSY D, GUPTA V. Compensated Mel frequency cepstrum coefficients ［C］∥ International Conference on Acoustics, Speech, and Signal Processing. Atlanta: IEEE, 1996: 323-326.

[1]	YU Miao, WANG Jia-sen, QI Dong-lian. Output-feedback adaptive learning control with unknown control direction[J]. J4, 2013, 47(8): 1424-1430.

[2]	ZHANG Lei, WU Yi-jie, WANG Bin, LIU Xiao-liang. Multi-objective optimization method of spatial flexiblecomponent based on orthogonal tests[J]. J4, 2012, 46(8): 1419-1423.

[3]	ZHANG Lei, WU Yi-jie, LI Jia-qi, WANG Bin, LIU Xiao-liang. Variable magnetic permeability self-sensing model of GMM based on dynamic coil impedance measurement[J]. J4, 2011, 45(10): 1726-1731.

[4]	HU Xu-xiao, PAN Xiao-hong, HE Wei, CHEN Gang. Step by step fitting algorithm for multi-order exponential function[J]. J4, 2010, 44(12): 2365-2369.

[5]	BAI Han, GUAN Cheng. Robust adaptive dynamic surface control of electro-hydraulic proportional system[J]. J4, 2010, 44(8): 1441-1448.

[6]	BAI Han, GUAN Cheng, BO Shuang-Jia. Fuzzy decision based sliding mode robust adaptive control for bulldozer[J]. J4, 2009, 43(12): 2178-2185.

Viewed

Full text

Abstract

Cited

Shared

Discussed