Please wait a minute...
浙江大学学报(工学版)
计算机技术﹑电信技术     
基于语义细胞的语音情感识别
孙凌云1, 何博伟1, 刘征2 , 杨智渊1
1. 浙江大学 现代工业设计研究所,浙江 杭州 310027; 2. 中国美术学院 设计艺术学院, 浙江 杭州 310024
Speech emotion recognition based on information cell
SUN Ling-yun1, HE Bo-wei1, LIU Zheng2, YANG Zhi-yuan1
1. Modern Industrial Design Institute, Zhejiang University, Hangzhou 310027, China; 2. School of Design, China Academy of Art, Hangzhou 310024, China
 全文: PDF(1187 KB)   HTML
摘要:

为解决语音情感识别分类器空间复杂度高的问题,将语义细胞应用于语音情感识别领域.以语义细胞混合模型为核心,提出基于单层语义细胞的语音情感识别(IC-S)算法以及基于说话人-情感识别的双层语义细胞识别(IC-D)算法.在CASIA(汉语)和SAVEE(英语)情感语料库中进行交叉验证实验,并利用F值评判识别性能.结果表明:相比常用算法(如:SVM),IC-S算法在空间和时间复杂度上具有优势;IC-D算法与SVM算法识别准确率相似,可以有效降低模型存储空间的复杂度, 适用于说话人分类较少或较为固定的场景.

Abstract:

Information cell was applied in the field of speech emotion recognition to address the problem of high space complexity of speech emotion recognition classifier. Single-layered information cell (IC-S) algorithm and speaker-emotion recognition based dual-layered information cell (IC-D) algorithm were proposed in the light of information cell mixture model. Cross-validation test on CASIA (in Chinese) and SAVEE (in English) corpus were conducted using F-score as the indicator of recognition performance. Results show that the IC-S algorithm has advantages in both time and space complexity compared to common algorithms like SVM.IC-D algorithm achieves similar recognition performance as SVM. IC-D algorithm can reduce the space complexity significantly and it is suitable for scenarios with few or fixed speakers.

出版日期: 2015-06-01
:  TP 391.42  
基金资助:

国家自然科学基金资助项目(61004116);浙江省自然科学基金资助项目(LY13E050005)

通讯作者: 刘征,男,副教授     E-mail: aliu6@126.com
作者简介: 孙凌云(1981—),男,副教授,博士,从事信息与交互设计方向研究.E-mail: sunly@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

孙凌云, 何博伟, 刘征, 杨智渊. 基于语义细胞的语音情感识别[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2015.06.001.

SUN Ling-yun, HE Bo-wei, LIU Zheng, YANG Zhi-yuan. Speech emotion recognition based on information cell. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2015.06.001.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2015.06.001        http://www.zjujournals.com/eng/CN/Y2015/V49/I6/1001

[1] SCHULLER B, RIGOLL G, LANG M. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture [C] ∥IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP 04). Montreal: IEEE, 2004: 1(1):577-580.
[2] JONES C M, JONASSON M. Performance analysis of acoustic emotion recognition for in-car conversational interfaces [M]∥Universal Access in Human-Computer Interaction. Ambient Interaction. Berlin Heidelberg: Springer, 2007: 411-420.
[3] 韩文静,李海峰,阮华斌,等.语音情感识别研究进展综述[J].软件学报, 2014, 25(1): 37-50.
HAN Wen-jing, LI Hai-feng, RUAN Hua-bin, et al. Review on speech emotion recognition [J]. Journal of Software, 2014, 25(1): 37-50.
[4] 张潇丹,黄程韦,赵力,等.应用改进混合蛙跳算法的实用语音情感识别[J].声学学报, 2014, 39(2): 271-280.
ZHANG Xiao-dan, HUANG Cheng-wei, ZHAO Li, et al. Recognition of practical speech emotion using improved shuffled frog leaping algorithm [J]. Acta Acustica, 2014, 39(2): 271-280.
[5] GHARAVIAN D, SHEIKHAN M, NAZERIEH A, et al. Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network [J]. Neural Computing and Applications, 2012, 21(8): 2115-2126.
[6] 李翔,李昕,胡晨,等. 面向智能机器人的Teager语音情感交互系统设计与实现[J].仪器仪表学报, 2013, 34(8): 1826-1833.
LI Xiang, LI Xin, HU Chen, et al. Design and implementation of speech emotion interaction system based on Teager for intelligent robot [J]. Chinese Journal of Scientific Instrument, 2013, 34(8): 1826-1833.
[7] LIN J C, WU C H, WEI W L. Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition[J]. IEEE Transactions on Multimedia, 2012, 14(1): 142-156.
[8] BHAYKAR M, YADAV J, RAO K S. Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM [C]∥ IEEE National Conference on Communications (NCC 2013). New Delhi: IEEE, 2013: 15.
[9] KHAN M, GOSKULA T, NASIRUDDIN M, et al. Comparison between k-NN and SVM method for speech emotion recognition [J]. International Journal on Computer Science and Engineering, 2011, 3(2): 607-611.
[10] PAN S, TAO J, LI Y. The CASIA audio emotion recognition method for audio/visual emotion challenge 2011 [C]∥Affective Computing and Intelligent Interaction. Berlin Heidelberg: Springer, 2011: 388-395.
[11] 黄永明,章国宝,董飞,等.层叠式“产生/判别”混合模型的语音情感识别\[J\].声学学报, 2013, 38(2): 231-240.
HUANG Yong-ming, ZHANG Guo-bao, DONG Fei, et al. Speech emotion recognition using stacked generative and discriminative hybrid models [J]. Acta Acustica, 2013, 38(2): 231-240.
[12] 蒋丹宁,蔡莲红.基于语音声学特征的情感信息识别[J]. 清华大学学报:自然科学版, 2006, 46(1): 86-89.
JIANG Dan-ning, CAI Lian-hong. Speech emotion recognition using acoustic features [J]. Journal of Tsinghua University: Science and Technology , 2006, 46(1): 86-89.
[13] TANG Y, LAWRY J. Information cell mixture models: the cognitive representations of vague concepts[M]∥Integrated Uncertainty Management and Applications. Berlin Heidelberg: Springer, 2010: 371-382.
[14] TANG Y, LAWRY J. Linguistic modelling and information coarsening based on prototype theory and label semantics [J]. International Journal of Approximate Reasoning, 2009, 50(8): 1177-1198.
[15] CAMPBELL N. Individual traits of speaking style and speech rhythm in a spoken discourse [M]∥Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction. Berlin Heidelberg: Springer, 2008: 107-120.
[16] GUPTA S, MEHRA A. Gender specific emotion recognition through speech signals [C]∥IEEE International Conference on Signal Processing and Integrated Networks (SPIN). Noida: IEEE, 2014: 727-733.
[17] TAO J H, YU J, KANG Y G. An expressive mandarin speech corpus [C]∥The International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques. Bali Island: COCOSDA, 2005.
[18] HAQ S, JACKSON P J B, EDGE J. Audio-visual feature selection and reduction for emotion classification[C]∥International Conference on Auditory-Visual Speech Processing (AVSP’08). Tangalooma: AVSP, 2008: 185-190.
[19] EYBEN F, WLLMER M, SCHULLER B. Opensmile: the munich versatile and fast open-source audio feature extractor[C]∥ ACM Proceedings of the international conference on Multimedia. Firenze: ACM, 2010: 1459-1462.
[1] 陈实, 郑楷洪, 孙凌云, 李彦. 可穿戴计算设备中振动表情的设计与应用[J]. 浙江大学学报(工学版), 2015, 49(12): 2298-2304.