|
|
Emotional speaker recognition based on similar neighbor phenomenon |
CHEN Li, YANG Ying-chun |
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China |
|
|
Abstract Based on the research on phonetics, the assumption that similar-sounding speakers in neutral condition also sound similar when they change their emotions was proposed, known as Similar Neighbor Phenomenon. Additionally, the qualitative and quantitative analysis was conducted to prove the assumption. The “neighbors” of neutral and emotional model of the similar speaker are almost the same under the identical phonetic event. The emotional model synthesis method was proposed in order to overcome the problem that the distribution of acoustic feature under emotional states was different from that of the neutral speaker model. The method can learn the neutral-emotion transformation rules from the development corpus, and apply them into the evaluation corpus to construct the emotional speaker model from his/her neutral one. From the view of Similar Neighbor Phenomenon, neighbors under neutral were selected by the KL distance. The emotional models were constructed by the neighbors-based transformation method and shift-based transformation method. The experiments carried on MASC showed an identification rate (IR) increase of 2.81% over the GMM-UBM algorithm and 1.3% over the emotional attribute projection (EAP) algorithm.
|
Published: 01 October 2012
|
|
基于邻居相似现象的情感说话人识别
根据语音学的研究,提出中性时发音相似的说话人,在情感状态下的发音人相似的假设——邻居相似现象,并通过定量和定性的分析验证了该假设,即在音素内容相同的情况下,同一说话人的中性模型和情感模型对应高斯分量的“邻居”基本类似.为了解决说话人情感变化时语音短时特征的分布与中性语音模型存在差异的问题,提出说话人情感模型合成的方法——将开发库中学习到的中性情感变化规律移植到评测库中,根据说话人的中性模型合成出情感模型.从邻居相似现象的特性出发,根据KL距离选取该说话人中性下若干相似的邻居,根据基于邻居的方法和基于邻居变换的方法,合成出该说话人的情感模型.MASC库上的实验结果表明,该方法的识别准确率比传统的GMM-UBM算法提高了2.81%,与情感属性映射(EAP)方法相比识别率提高了1.3%.
|
|
[1] GHIURCAU M V, RUSU C, ASTOLA J. A study of the effect of emotional state upon textindependent speaker identification [C]∥ International Conference on Acoustics, Speech and Signal Processing. Prague: IEEE, 2011: 4944-4947.
[2] BAO H, XU M, ZHENG T F. Emotion attribute projection for speaker recognition on emotional speech [C]∥ 8th Annual Conference of the International Speech Communication Association. Antwerp: IEEE, 2007: 758-761.
[3] HUANG T, YANG Y. Applying pitchdependent difference detection and modification to emotional speaker recognition [C] ∥ 9th Annual Conference of the International Speech Communication Association. Brisbane: IEEE, 2008: 2751-2754.
[4] HUANG T, YANG Y. Learning virtual HD model for bimodel emotional speaker recognition [C]∥ International Conference on Pattern Recognition. Istanbul: IEEE, 2010: 1614-1617.
[5] 单振宇,杨莹春.基于多项式拟合的中性情感模型转换算法[J].计算机工程与应用,2006,44(21): 206-209.
SHAN Zhenyu, YANG Yingchun. Neutralemotion model transformation algorithm based on polynomial function fitting [J]. Computer Engineering and Applications, 2006, 44(21): 206-209.
[6] SHAN Z, YANG Y. Naturalemotion GMM transformation algorithm for emotional speaker recognition [C]∥ 8th Annual Conference of the International Speech Communication Association. Antwerp: IEEE, 2007: 782-785.
[7] SHAN Z, YANG Y. Learning polynomial function based neutralemotion GMM transformation for emotional speaker recognition [C]∥ International Conference on Pattern Recognition. Tampa: IEEE, 2008: 8-11.
[8] 胡平,曹伟国,李华.一类等距不变量及其在三维表情人脸识别中的应用[J].计算机辅助设计与图形学学报,2010(12): 2089-2094.
HU Ping, CAO Weiguo, LI Hua. A novel isometric invariant and its applications in 3D face recognition [J]. Journal of ComputerAided Design and Computer Graphics, 2010(12): 2089-2094.
[9] 李爱军,邵鹏飞,党建武.情感表达的跨文化多模态感知研究[J].清华大学学报:自然科学版,2009(增1): 1-8.
LI Aijun, SHAO Pengfei, DANG Jianwu. Crosscultural and multimodal investigation of emotion expression [J]. Journal of Tsinghua University: Science and Technology, 2009(suppl.1): 1-8.
[10] REYNOLDS D A, ROSE R C. Robust textindependent speaker identification using Gaussian mixture speaker models [J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72-83.
[11] REYNOLDS D A, QUATIERI T F, DUNN Q B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000, 10(1/2/3): 19-41.
[12] HERSHEY J R, OLSEN P A. Approximating the Kullback Leibler divergence between Gaussian mixture models [C]∥ International Conference on Acoustics, Speech, and Signal Processing. Honolulu: IEEE, 2007: 317-320.
[13] HORTON P, NAKAI K. Better prediction of protein cellular localization sites with the k nearest neighbors classifier [C]∥ American Association for Artificial Intelligence. Providence: IEEE, 1997: 147-152.
[14] WU T, YANG Y, WU Z, et al. MASC: a speech corpus in mandarin for emotion analysis and affective speaker recognition [C]∥ ODYSSEY 2006, the Speaker and Language Recognition Workshop. Brno: IEEE, 2006: 1-5.
[15] VERGIN R, O’SHAUGHNESSY D, GUPTA V. Compensated Mel frequency cepstrum coefficients [C]∥ International Conference on Acoustics, Speech, and Signal Processing. Atlanta: IEEE, 1996: 323-326. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|