基于邻居相似现象的情感说话人识别

doi:10.3785/j.issn.1008-973X.2012.10.009

2012, Vol. 46

Issue (10): 1790-1795 DOI: 10.3785/j.issn.1008-973X.2012.10.009

计算机技术﹑电信技术

基于邻居相似现象的情感说话人识别

陈力, 杨莹春

浙江大学计算机科学与技术学院,浙江杭州 310027

Emotional speaker recognition based on similar neighbor phenomenon

CHEN Li, YANG Ying-chun

College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

全文: PDF HTML

摘要：

根据语音学的研究,提出中性时发音相似的说话人,在情感状态下的发音人相似的假设——邻居相似现象,并通过定量和定性的分析验证了该假设,即在音素内容相同的情况下,同一说话人的中性模型和情感模型对应高斯分量的“邻居”基本类似.为了解决说话人情感变化时语音短时特征的分布与中性语音模型存在差异的问题,提出说话人情感模型合成的方法——将开发库中学习到的中性情感变化规律移植到评测库中,根据说话人的中性模型合成出情感模型.从邻居相似现象的特性出发,根据KL距离选取该说话人中性下若干相似的邻居,根据基于邻居的方法和基于邻居变换的方法,合成出该说话人的情感模型.MASC库上的实验结果表明,该方法的识别准确率比传统的GMM-UBM算法提高了2.81%,与情感属性映射（EAP）方法相比识别率提高了1.3%.

Abstract:

Based on the research on phonetics, the assumption that similar-sounding speakers in neutral condition also sound similar when they change their emotions was proposed, known as Similar Neighbor Phenomenon. Additionally, the qualitative and quantitative analysis was conducted to prove the assumption. The “neighbors” of neutral and emotional model of the similar speaker are almost the same under the identical phonetic event. The emotional model synthesis method was proposed in order to overcome the problem that the distribution of acoustic feature under emotional states was different from that of the neutral speaker model. The method can learn the neutral-emotion transformation rules from the development corpus, and apply them into the evaluation corpus to construct the emotional speaker model from his/her neutral one. From the view of Similar Neighbor Phenomenon, neighbors under neutral were selected by the KL distance. The emotional models were constructed by the neighbors-based transformation method and shift-based transformation method. The experiments carried on MASC showed an identification rate (IR) increase of 2.81% over the GMM-UBM algorithm and 1.3% over the emotional attribute projection (EAP) algorithm.

出版日期: 2012-10-01

TP 271

基金资助:

国家自然科学基金资助项目(60970080);核高基重大专项资助项目(2009ZX01039-002-001-04).

通讯作者: 杨莹春,女,副教授. E-mail: yyc@zju.edu.cn

作者简介: 陈力(1987—),男,博士生,从事语音研究和机器学习的研究.E-mail: stchenli@zju.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

引用本文:

陈力, 杨莹春. 基于邻居相似现象的情感说话人识别[J]. J4, 2012, 46(10): 1790-1795.

CHEN Li, YANG Ying-chun. Emotional speaker recognition based on similar neighbor phenomenon. J4, 2012, 46(10): 1790-1795.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2012.10.009 或 http://www.zjujournals.com/eng/CN/Y2012/V46/I10/1790

［1］ GHIURCAU M V, RUSU C, ASTOLA J. A study of the effect of emotional state upon textindependent speaker identification ［C］∥ International Conference on Acoustics, Speech and Signal Processing. Prague: IEEE, 2011: 4944-4947.
［2］ BAO H, XU M, ZHENG T F. Emotion attribute projection for speaker recognition on emotional speech ［C］∥ 8th Annual Conference of the International Speech Communication Association. Antwerp: IEEE, 2007: 758-761.
［3］ HUANG T, YANG Y. Applying pitchdependent difference detection and modification to emotional speaker recognition ［C］ ∥ 9th Annual Conference of the International Speech Communication Association. Brisbane: IEEE, 2008: 2751-2754.
［4］ HUANG T, YANG Y. Learning virtual HD model for bimodel emotional speaker recognition ［C］∥ International Conference on Pattern Recognition. Istanbul: IEEE, 2010: 1614-1617.
［5］单振宇,杨莹春.基于多项式拟合的中性情感模型转换算法［J］.计算机工程与应用,2006,44(21): 206-209.
SHAN Zhenyu, YANG Yingchun. Neutralemotion model transformation algorithm based on polynomial function fitting ［J］. Computer Engineering and Applications, 2006, 44(21): 206-209.
［6］ SHAN Z, YANG Y. Naturalemotion GMM transformation algorithm for emotional speaker recognition ［C］∥ 8th Annual Conference of the International Speech Communication Association. Antwerp: IEEE, 2007: 782-785.
［7］ SHAN Z, YANG Y. Learning polynomial function based neutralemotion GMM transformation for emotional speaker recognition ［C］∥ International Conference on Pattern Recognition. Tampa: IEEE, 2008: 8-11.
［8］胡平,曹伟国,李华.一类等距不变量及其在三维表情人脸识别中的应用［J］.计算机辅助设计与图形学学报,2010(12): 2089-2094.
HU Ping, CAO Weiguo, LI Hua. A novel isometric invariant and its applications in 3D face recognition ［J］. Journal of ComputerAided Design and Computer Graphics, 2010(12): 2089-2094.
［9］李爱军,邵鹏飞,党建武.情感表达的跨文化多模态感知研究［J］.清华大学学报:自然科学版,2009(增1): 1-8.
LI Aijun, SHAO Pengfei, DANG Jianwu. Crosscultural and multimodal investigation of emotion expression ［J］. Journal of Tsinghua University: Science and Technology, 2009(suppl.1): 1-8.
［10］ REYNOLDS D A, ROSE R C. Robust textindependent speaker identification using Gaussian mixture speaker models ［J］. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72-83.
［11］ REYNOLDS D A, QUATIERI T F, DUNN Q B. Speaker verification using adapted Gaussian mixture models ［J］. Digital Signal Processing, 2000, 10(1/2/3): 19-41.
［12］ HERSHEY J R, OLSEN P A. Approximating the Kullback Leibler divergence between Gaussian mixture models ［C］∥ International Conference on Acoustics, Speech, and Signal Processing. Honolulu: IEEE, 2007: 317-320.
［13］ HORTON P, NAKAI K. Better prediction of protein cellular localization sites with the k nearest neighbors classifier ［C］∥ American Association for Artificial Intelligence. Providence: IEEE, 1997: 147-152.
［14］ WU T, YANG Y, WU Z, et al. MASC: a speech corpus in mandarin for emotion analysis and affective speaker recognition ［C］∥ ODYSSEY 2006, the Speaker and Language Recognition Workshop. Brno: IEEE, 2006: 1-5.
［15］ VERGIN R, O’SHAUGHNESSY D, GUPTA V. Compensated Mel frequency cepstrum coefficients ［C］∥ International Conference on Acoustics, Speech, and Signal Processing. Atlanta: IEEE, 1996: 323-326.

[1]	于淼, 王佳森, 齐冬莲. 具有未知控制方向的输出反馈自适应学习控制[J]. J4, 2013, 47(8): 1424-1430.
[2]	张雷, 邬义杰, 王彬, 刘孝亮. 基于正交建模的空间柔顺构件多目标优化[J]. J4, 2012, 46(8): 1419-1423.
[3]	张雷, 邬义杰, 李佳琪, 王彬, 刘孝亮. 基于线圈阻抗动态测量的GMM自传感模型[J]. J4, 2011, 45(10): 1726-1731.
[4]	胡旭晓, 潘晓弘, 何卫, 陈罡. 一类多阶指数函数的逐级递推式拟合算法[J]. J4, 2010, 44(12): 2365-2369.
[5]	白寒, 管成. 电液比例系统鲁棒自适应动态表面控制[J]. J4, 2010, 44(8): 1441-1448.
[6]	白寒, 管成, 潘双夏. 基于模糊决策的推土机滑模鲁棒自适应控制[J]. J4, 2009, 43(12): 2178-2185.

Viewed

Full text

Abstract

Cited

Shared

Discussed