Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2015, Vol. 16 Issue (5): 358-366    DOI: 10.1631/FITEE.1400323
    
Speech emotion recognition with unsupervised feature learning
Zheng-wei Huang, Wen-tao Xue, Qi-rong Mao
Department of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  Emotion-based features are critical for achieving high performance in a speech emotion recognition (SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms (including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network.

Key wordsSpeech emotion recognition      Unsupervised feature learning      Neural network      Affect computing     
Received: 16 September 2014      Published: 05 May 2015
CLC:  TP391.4  
Cite this article:

Zheng-wei Huang, Wen-tao Xue, Qi-rong Mao. Speech emotion recognition with unsupervised feature learning. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 358-366.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/FITEE.1400323     OR     http://www.zjujournals.com/xueshu/fitee/Y2015/V16/I5/358


基于无监督特征学习的语音情感识别方法

目的:语音情感识别是人机交互的关键技术之一。同时,良好的情感特征对语音情感识别系统性能具有极大影响。目前的语音情感特征主要通过手工设计方法提取,对于其是否能够很好地刻画情感特性以及是否存在最优情感特征集,相关研究者并没有达成公认。所以有必要对语音情感特征提取进行进一步深入研究。
创新点:提出一种基于数据驱动的无监督情感特征学习方法。该方法能够自动从无标注语音数据中学习产生与情感相关的特征映射函数,用于语音情感特征提取。
方法:采用三种无监督学习算法(K-均值聚类,稀疏自动编码器,稀疏受限玻尔兹曼机)从若干无标注语音块中学习产生与目标相关的特征提取器,继而对整个语音样本进行特征提取(卷积和池化),最后训练一个线性支持向量机对未知样本进行识别。同时对模型涉及的超参数(块大小和隐层结点数目)进行选择。
结论:相对于传统原始特征,学习产生的特征具有一定的稀疏性并且对说话人及其他扰动因素具有一定鲁棒性。实验结果表明,尺寸较大的块和数量较多的隐层结点有助于提升系统性能(图4、5)。

关键词: 语音情感识别,  无监督特征学习,  神经网络,  情感计算 
[1] Yu-jun Xiao, Wen-yuan Xu, Zhen-hua Jia, Zhuo-ran Ma, Dong-lian Qi. NIPAD: a non-invasive power-based anomaly detection scheme for programmable logic controllers[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 519-534.
[2] Muhammad Asif Zahoor Raja, Iftikhar Ahmad, Imtiaz Khan, Muhammed Ibrahem Syam, Abdul Majid Wazwaz. Neuro-heuristic computational intelligence for solving nonlinear pantograph systems[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 464-484.
[3] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. Two-level hierarchical feature learning for image classification[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.
[4] Gurmanik Kaur, Ajat Shatru Arora, Vijender Kumar Jain. Using hybrid models to predict blood pressure reactivity to unsupported back based on anthropometric characteristics[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 474-485.
[5] Ying Cai, Meng-long Yang, Jun Li. Multiclass classification based on a deep convolutional network for head pose estimation[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 930-939.
[6] Fei-wei Qin, Lu-ye Li, Shu-ming Gao, Xiao-ling Yang, Xiang Chen. A deep learning approach to the classification of 3D CAD models[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(2): 91-106.
[7] Yong-gang Peng, Jun Wang, Wei Wei. Model predictive control of servo motor driven constant pump hydraulic system in injection molding process based on neurodynamic optimization[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(2): 139-146.
[8] Xiao-hua Wang, Juan-juan Yu, Yao Huang, Hua Wang, Zhong-hua Miao. Adaptive dynamic programming for linear impulse systems[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(1): 43-50.
[9] Qi-rong Mao, Xiao-lei Zhao, Zheng-wei Huang, Yong-zhao Zhan. Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features[J]. Front. Inform. Technol. Electron. Eng., 2013, 14(7): 573-582.
[10] Ali Uysal, Raif Bayir. Real-time condition monitoring and fault diagnosis in switched reluctance motors with Kohonen neural network[J]. Front. Inform. Technol. Electron. Eng., 2013, 14(12): 941-952.
[11] Yan Liu, Jie Yang, Long Li, Wei Wu. Negative effects of sufficiently small initial weights on back-propagation neural networks[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(8): 585-592.
[12] Hasan Abbasi Nozari, Hamed Dehghan Banadaki, Mohammad Mokhtare, Somayeh Hekmati Vahed. Intelligent non-linear modelling of an industrial winding process using recurrent local linear neuro-fuzzy networks[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(6): 403-412.
[13] Xin-zheng Xu, Shi-fei Ding, Zhong-zhi Shi, Hong Zhu. Optimizing radial basis function neural network based on rough sets and affinity propagation clustering algorithm[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(2): 131-138.
[14] Seyed Mehdi Rakhtala, Reza Ghaderi, Abolzal Ranjbar Noei. Proton exchange membrane fuel cell voltage-tracking using artificial neural networks[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(4): 338-344.
[15] Qing-zheng Xu, Lei Wang. Recent advances in the artificial endocrine system[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(3): 171-183.