Prediction of emotional dimensions PAD for emotional speech recognition

doi:10.3785/j.issn.1008-973X.2019.10.022

Journal of ZheJiang University (Engineering Science)

2019, Vol. 53

Issue (10): 2041-2048 DOI: 10.3785/j.issn.1008-973X.2019.10.022

Communication technology

Prediction of emotional dimensions PAD for emotional speech recognition

Ying SUN(

),Yan-xiang HU,Xue-ying ZHANG*(

),Shu-fei DUAN

College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China

Download:

HTML

PDF(776KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

The continuous emotional dimension PAD (pleasure, arousal, dominance) was proposed to introduce into emotion recognition in view of the fact that the existing emotional characteristics only analyze emotion from the point of view of signal, and can not directly reflect the emotional state. The experimental samples were based on three emotions (sadness, anger and happiness) from the TYUT2.0 database and the Berlin voice library, and the emotional features (prosodic feature, formant, MFCC and nonlinear feature) were extracted. Grey relational analysis (GRA) was used to select the main features that affect P, A and D in order to obtain the objective and accurate PAD dimension values. Then principal component analysis (PCA) was used to extract the principal components of the main features, and was made as the input of least squares support vector machine (LSSVM) to predict the P, A and D. The emotional features, PAD dimensions and their fusion were used separately for emotion recognition by using support vector machine. The experimental results show that the prediction method improves the prediction accuracy of the P, A and D to a certain extent. The predictive values can effectively identify the emotion, which has a certain complement to emotional characteristics in emotion recognition.

Key words： speech emotion recognition PAD dimensions least squares support vector machine (LSSVM) grey relational analysis (GRA) principal component analysis (PCA)

Received: 22 August 2018 Published: 30 September 2019

CLC:

TN 912

Corresponding Authors: Xue-ying ZHANG E-mail: tyutsy@163.com;tyzhangxy@163.com

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Ying SUN
	Yan-xiang HU
	Xue-ying ZHANG
	Shu-fei DUAN

Cite this article:

Ying SUN,Yan-xiang HU,Xue-ying ZHANG,Shu-fei DUAN. Prediction of emotional dimensions PAD for emotional speech recognition. Journal of ZheJiang University (Engineering Science), 2019, 53(10): 2041-2048.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2019.10.022 OR http://www.zjujournals.com/eng/Y2019/V53/I10/2041

面向情感语音识别的情感维度PAD预测

针对现有的情感特征仅从信号的角度对情感进行分析，不能直观反映情感状态的问题，提出将连续情感维度PAD引入情感识别. 实验样本选用TYUT2.0数据库和柏林语音库中的3种情感（悲伤、愤怒和高兴），提取情感特征（韵律特征、共振峰、MFCC和非线性特征）. 为了获取客观、精确的PAD维度，利用灰色关联分析（GRA）选取影响P、A、D的主要特征，通过主成分分析（PCA）提取主要特征的主成分，将主成分作为最小二乘支持向量机（LSSVM）的输入预测P、A、D. 分别对情感特征、PAD维度及它们的融合，采用支持向量机进行情感识别. 实验结果表明，该预测方法在一定程度上提高了对P、A、D的预测精度，预测值可以有效识别情感，对情感特征在情感识别方面有一定的补充作用.

关键词： 语音情感识别, PAD维度, 最小二乘支持向量机（LSSVM）, 灰色关联分析（GRA）, 主成分分析（PCA）

Fig.1 PAD three-dimensional emotion model

Fig.2 Flow chart of GRA-PCA-LSSVM model to predict P，A，D

Fig.3 PAD spatial emotional distribution

Tab.1 Emotional speech characteristics

Fig.4 MAE error chart of PAD prediction based on different feature dimensions

Tab.2 Feature dimensions of GRA-PCA

Tab.3 Comparison of prediction results of four kinds of regression models in two kinds of databases

Tab.4 Comparison of recognition rate between PAD dimension and FPFMN feature


[1]	蒋海华, 胡斌基于PCA和SVM的普通话语音情感识别[J]. 计算机科学, 2015, 42 (11): 270- 273 JIANG Hai-hua, HU Bin Speech emotion recognition in mandarin based on PCA and SVM[J]. Computer Science, 2015, 42 (11): 270- 273

[2]	谭发曾. 语音情感状态模糊识别研究[D]. 成都: 电子科技大学, 2015. TAN Fa-zeng. Study of speech motion states fuzzy recognition [D]. Chengdu: University of Electronic Science and Technology of China, 2015.

[3]	ZBANCIOC M D, FERARU M. Using the Lyapunov exponent from cepstral coefficients for automatic emotion recognition [C] // International Conference and Exposition on Electrical and Power Engineering. Iasi, Romania: IEEE, 2014: 110-113.

[4]	孙颖, 宋春晓相空间重构的情感语音特征提取及优化[J]. 西安电子科技大学学报: 自然科学版, 2017, 44 (6): 162- 168 SUN Ying, SONG Chun-xiao Emotional speech feature extraction and optimization of phase space reconstruction[J]. Journal of Xidian University: Natural Science, 2017, 44 (6): 162- 168

[5]	MEHRABIAN A Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament[J]. Current Psychology, 1996, 14 (4): 261- 292 doi: 10.1007/BF02686918

[6]	VERMA G K, TIWARY U S Affect representation and recognition in 3D continuous valence–arousal–dominance space[J]. Multimedia Tools and Applications, 2016, 76 (2): 1- 25

[7]	SUYKENS J A K, VANDEWALLE J Least squares support machine classifiers[J]. Neural Processing Letters, 1999, 9 (3): 293- 300 doi: 10.1023/A:1018628609742

[8]	SUN W, SUN J Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm[J]. Journal of Environmental Management, 2016, 188: 144- 152

[9]	CAI Z, XU W, MENG Y, et al Prediction of landslide displacement based on GA-LSSVM with multiple factors[J]. Bulletin of Engineering Geology and the Environment, 2016, 75 (2): 637- 646 doi: 10.1007/s10064-015-0804-z

[10]	梁宁, 耿立艳, 张占福, 等基于GRA与SVM-mixed的货运量预测方法[J]. 交通运输系统工程与信息, 2016, 16 (6): 94- 99 LIANG Ning, GENG Li-yan, ZHANG Zhan-fu, et al A prediction method of railway freight volumes using GRA and SVM-mixed[J]. Journal of Transportation Systems Engineering and Information Technology, 2016, 16 (6): 94- 99 doi: 10.3969/j.issn.1009-6744.2016.06.015

[11]	王沛, 欧阳传湘, 陈宏生, 等应用PCA和多元非线性回归快速预测储层敏感性[J]. 断块油气田, 2018, 25 (2): 232- 235 WANG Pei, OUYANG Chuan-xiang, CHEN Hong-sheng, et al Application of PCA and multiple nonlinear regression to rapid prediction of reservoir sensitivity[J]. Fault-Block Oil and Gas Field, 2018, 25 (2): 232- 235

[12]	王丽. V-A空间连续维度情感预测方法研究[D]. 镇江: 江苏大学, 2015. WANG Li. Research on dimensional and continuous emotion prediction in valence-arousal space [D]. Zhenjiang: Jiangsu University, 2015.

[13]	汪建新, 陈肖洁 LSSVM的特征选择算法在烧结过程的应用[J]. 机械设计与制造, 2018, (3): 75- 77 WANG Jian-xin, CHEN Xiao-jie Application in sintering process modeling using the feature selection algorithm of least squares support vector machine[J]. Machinery Design and Manufacture, 2018, (3): 75- 77 doi: 10.3969/j.issn.1001-3997.2018.03.023

[14]	张雪英, 张婷, 孙颖, 等情感语音数据库优化及PAD情感模型量化标注[J]. 太原理工大学学报, 2017, 48 (3): 469- 474 ZHANG Xue-ying, ZHANG Ting, SUN Ying, et al Emotional speech database optimization and quantitative annotation based on PAD emotion model[J]. Journal of Taiyuan University of Technology, 2017, 48 (3): 469- 474

[15]	BURKHARDT F, PAESCHKE A, ROLFES M, et al. A database of German emotional speech [C] // European Conference on Speech Communication and Technology. Lisbon, Portugal: DBLP, 2005: 1517-1520.

[16]	姚慧, 孙颖, 张雪英情感语音的非线性动力学特征[J]. 西安电子科技大学学报: 自然科学版, 2016, 43 (5): 167- 172 YAO Hui, SUN Ying, ZHANG Xue-ying Research on nonlinear dynamics features of emotional speech[J]. Journal of Xidian University: Natural Science, 2016, 43 (5): 167- 172 doi: 10.3969/j.issn.1001-2400.2016.05.029

[17]	李幼军, 钟宁, 黄佳进, 等基于高斯核函数支持向量机的脑电信号时频特征情感多类识别[J]. 北京工业大学学报, 2018, 44 (2): 234- 243 LI You-jun, ZHONG Ning, HUANG Jia-jin, et al Human emotion multi-classification recognition based on the EEG time and frequency features by using a Gaussian kernel function SVM[J]. Journal of Beijing University of Technology, 2018, 44 (2): 234- 243 doi: 10.11936/bjutxb2017040018

[1]	Yan-biao LI,Hang ZHENG,Meng-ru XU,Yi-qin LUO,Peng SUN. Multi-target parameters of performance optimization for 5-PSS/UPU parallel mechanism[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(4): 654-663.

[2]	WU Ping, CHEN Liang, ZHOU Wei, GUO Ling-ling. Online subspace identification based on principal component analysis and noise estimation[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1694-1701.

[3]	MENG Jun, DENG Xiao-yu, YU Jie-zhou. Postoperative survival prediction model of BP neural network with variable cluster[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(12): 2365-2371.

[4]	SUN Ling-yun, HE Bo-wei, LIU Zheng, YANG Zhi-yuan. Speech emotion recognition based on information cell[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(6): 1001-1009.

[5]	WANG Lu-jun, LV Zheng-yu. Elevator traffic pattern fuzzy recognition based on least squares support vector machine[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(7): 1333-1338.

[6]	BAO Bi-sai, WU Jian-rong , LOU Xiao-jun, LIU Hai-tao. Feature fusion algorithm based on two-dimensional feature matrix[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(11): 2081-2088.

[7]	YANG Xian-Yong, ZHOU Xiao-Jun, ZHANG Wen-Bin, YANG Fu-Chun. Rolling bearing fault diagnosis based on local wave method and KPCA-LSSVM[J]. Journal of ZheJiang University (Engineering Science), 2010, 44(8): 1519-1524.

[8]	LIU Shi-Cheng, WANG Hai-Qing, LI Beng. Recursive PCA algorithm based on rank-one matrix perturbation[J]. Journal of ZheJiang University (Engineering Science), 2009, 43(5): 827-831.

Viewed

Full text

Abstract

Cited

Shared

Discussed