面向情感语音识别的情感维度PAD预测

doi:10.3785/j.issn.1008-973X.2019.10.022

浙江大学学报(工学版)

2019, Vol. 53

Issue (10): 2041-2048 DOI: 10.3785/j.issn.1008-973X.2019.10.022

通信技术

面向情感语音识别的情感维度PAD预测

孙颖(

),胡艳香,张雪英*(

),段淑斐

太原理工大学信息与计算机学院，山西太原 030024

Prediction of emotional dimensions PAD for emotional speech recognition

Ying SUN(

),Yan-xiang HU,Xue-ying ZHANG*(

),Shu-fei DUAN

College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China

全文: PDF(776 KB) HTML

摘要：

针对现有的情感特征仅从信号的角度对情感进行分析，不能直观反映情感状态的问题，提出将连续情感维度PAD引入情感识别. 实验样本选用TYUT2.0数据库和柏林语音库中的3种情感（悲伤、愤怒和高兴），提取情感特征（韵律特征、共振峰、MFCC和非线性特征）. 为了获取客观、精确的PAD维度，利用灰色关联分析（GRA）选取影响P、A、D的主要特征，通过主成分分析（PCA）提取主要特征的主成分，将主成分作为最小二乘支持向量机（LSSVM）的输入预测P、A、D. 分别对情感特征、PAD维度及它们的融合，采用支持向量机进行情感识别. 实验结果表明，该预测方法在一定程度上提高了对P、A、D的预测精度，预测值可以有效识别情感，对情感特征在情感识别方面有一定的补充作用.

关键词： 语音情感识别; PAD维度; 最小二乘支持向量机（LSSVM）; 灰色关联分析（GRA）; 主成分分析（PCA）

Abstract:

The continuous emotional dimension PAD (pleasure, arousal, dominance) was proposed to introduce into emotion recognition in view of the fact that the existing emotional characteristics only analyze emotion from the point of view of signal, and can not directly reflect the emotional state. The experimental samples were based on three emotions (sadness, anger and happiness) from the TYUT2.0 database and the Berlin voice library, and the emotional features (prosodic feature, formant, MFCC and nonlinear feature) were extracted. Grey relational analysis (GRA) was used to select the main features that affect P, A and D in order to obtain the objective and accurate PAD dimension values. Then principal component analysis (PCA) was used to extract the principal components of the main features, and was made as the input of least squares support vector machine (LSSVM) to predict the P, A and D. The emotional features, PAD dimensions and their fusion were used separately for emotion recognition by using support vector machine. The experimental results show that the prediction method improves the prediction accuracy of the P, A and D to a certain extent. The predictive values can effectively identify the emotion, which has a certain complement to emotional characteristics in emotion recognition.

Key words: speech emotion recognition PAD dimensions least squares support vector machine (LSSVM) grey relational analysis (GRA) principal component analysis (PCA)

收稿日期: 2018-08-22 出版日期: 2019-09-30

CLC:

TN 912

通讯作者: 张雪英 E-mail: tyutsy@163.com;tyzhangxy@163.com

作者简介: 孙颖（1981—），女，讲师，从事情感语音识别、情感计算的研究. orcid.org/0000-0003-3926-062X. E-mail： tyutsy@163.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	孙颖
	胡艳香
	张雪英
	段淑斐

引用本文:

孙颖,胡艳香,张雪英,段淑斐. 面向情感语音识别的情感维度PAD预测[J]. 浙江大学学报(工学版), 2019, 53(10): 2041-2048.

Ying SUN,Yan-xiang HU,Xue-ying ZHANG,Shu-fei DUAN. Prediction of emotional dimensions PAD for emotional speech recognition. Journal of ZheJiang University (Engineering Science), 2019, 53(10): 2041-2048.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2019.10.022 或 http://www.zjujournals.com/eng/CN/Y2019/V53/I10/2041

图 1 PAD三维情感模型

图 2 GRA-PCA-LSSVM模型预测P、A、D流程图

图 3 PAD空间情感分布

表 1 情感语音特征

图 4 基于不同特征维数的PAD预测MAE误差趋势图

表 2 GRA-PCA特征维数

表 3 4类回归模型在2类数据库的预测结果比较

表 4 PAD维度与FPFMN特征的识别率对比

1	蒋海华, 胡斌基于PCA和SVM的普通话语音情感识别[J]. 计算机科学, 2015, 42 (11): 270- 273 JIANG Hai-hua, HU Bin Speech emotion recognition in mandarin based on PCA and SVM[J]. Computer Science, 2015, 42 (11): 270- 273
2	谭发曾. 语音情感状态模糊识别研究[D]. 成都: 电子科技大学, 2015. TAN Fa-zeng. Study of speech motion states fuzzy recognition [D]. Chengdu: University of Electronic Science and Technology of China, 2015.
3	ZBANCIOC M D, FERARU M. Using the Lyapunov exponent from cepstral coefficients for automatic emotion recognition [C] // International Conference and Exposition on Electrical and Power Engineering. Iasi, Romania: IEEE, 2014: 110-113.
4	孙颖, 宋春晓相空间重构的情感语音特征提取及优化[J]. 西安电子科技大学学报: 自然科学版, 2017, 44 (6): 162- 168 SUN Ying, SONG Chun-xiao Emotional speech feature extraction and optimization of phase space reconstruction[J]. Journal of Xidian University: Natural Science, 2017, 44 (6): 162- 168
5	MEHRABIAN A Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament[J]. Current Psychology, 1996, 14 (4): 261- 292 doi: 10.1007/BF02686918
6	VERMA G K, TIWARY U S Affect representation and recognition in 3D continuous valence–arousal–dominance space[J]. Multimedia Tools and Applications, 2016, 76 (2): 1- 25
7	SUYKENS J A K, VANDEWALLE J Least squares support machine classifiers[J]. Neural Processing Letters, 1999, 9 (3): 293- 300 doi: 10.1023/A:1018628609742
8	SUN W, SUN J Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm[J]. Journal of Environmental Management, 2016, 188: 144- 152
9	CAI Z, XU W, MENG Y, et al Prediction of landslide displacement based on GA-LSSVM with multiple factors[J]. Bulletin of Engineering Geology and the Environment, 2016, 75 (2): 637- 646 doi: 10.1007/s10064-015-0804-z
10	梁宁, 耿立艳, 张占福, 等基于GRA与SVM-mixed的货运量预测方法[J]. 交通运输系统工程与信息, 2016, 16 (6): 94- 99 LIANG Ning, GENG Li-yan, ZHANG Zhan-fu, et al A prediction method of railway freight volumes using GRA and SVM-mixed[J]. Journal of Transportation Systems Engineering and Information Technology, 2016, 16 (6): 94- 99 doi: 10.3969/j.issn.1009-6744.2016.06.015
11	王沛, 欧阳传湘, 陈宏生, 等应用PCA和多元非线性回归快速预测储层敏感性[J]. 断块油气田, 2018, 25 (2): 232- 235 WANG Pei, OUYANG Chuan-xiang, CHEN Hong-sheng, et al Application of PCA and multiple nonlinear regression to rapid prediction of reservoir sensitivity[J]. Fault-Block Oil and Gas Field, 2018, 25 (2): 232- 235
12	王丽. V-A空间连续维度情感预测方法研究[D]. 镇江: 江苏大学, 2015. WANG Li. Research on dimensional and continuous emotion prediction in valence-arousal space [D]. Zhenjiang: Jiangsu University, 2015.
13	汪建新, 陈肖洁 LSSVM的特征选择算法在烧结过程的应用[J]. 机械设计与制造, 2018, (3): 75- 77 WANG Jian-xin, CHEN Xiao-jie Application in sintering process modeling using the feature selection algorithm of least squares support vector machine[J]. Machinery Design and Manufacture, 2018, (3): 75- 77 doi: 10.3969/j.issn.1001-3997.2018.03.023
14	张雪英, 张婷, 孙颖, 等情感语音数据库优化及PAD情感模型量化标注[J]. 太原理工大学学报, 2017, 48 (3): 469- 474 ZHANG Xue-ying, ZHANG Ting, SUN Ying, et al Emotional speech database optimization and quantitative annotation based on PAD emotion model[J]. Journal of Taiyuan University of Technology, 2017, 48 (3): 469- 474
15	BURKHARDT F, PAESCHKE A, ROLFES M, et al. A database of German emotional speech [C] // European Conference on Speech Communication and Technology. Lisbon, Portugal: DBLP, 2005: 1517-1520.
16	姚慧, 孙颖, 张雪英情感语音的非线性动力学特征[J]. 西安电子科技大学学报: 自然科学版, 2016, 43 (5): 167- 172 YAO Hui, SUN Ying, ZHANG Xue-ying Research on nonlinear dynamics features of emotional speech[J]. Journal of Xidian University: Natural Science, 2016, 43 (5): 167- 172 doi: 10.3969/j.issn.1001-2400.2016.05.029
17	李幼军, 钟宁, 黄佳进, 等基于高斯核函数支持向量机的脑电信号时频特征情感多类识别[J]. 北京工业大学学报, 2018, 44 (2): 234- 243 LI You-jun, ZHONG Ning, HUANG Jia-jin, et al Human emotion multi-classification recognition based on the EEG time and frequency features by using a Gaussian kernel function SVM[J]. Journal of Beijing University of Technology, 2018, 44 (2): 234- 243 doi: 10.11936/bjutxb2017040018

[1]	李研彪,郑航,徐梦茹,罗怡沁,孙鹏. 5-PSS/UPU并联机构的多目标性能参数优化[J]. 浙江大学学报(工学版), 2019, 53(4): 654-663.
[2]	吴平, 陈亮, 周伟, 郭玲玲. 基于主成分分析和噪声估计的在线子空间辨识[J]. 浙江大学学报(工学版), 2018, 52(9): 1694-1701.
[3]	孟濬, 邓晓雨, 虞捷舟. 基于变量聚类的BP神经网络术后生存期预测模型[J]. 浙江大学学报(工学版), 2018, 52(12): 2365-2371.
[4]	谢罗峰, 徐慧宁, 黄沁元, 赵越, 殷国富. 应用双树复小波包和NCA-LSSVM检测磁瓦内部缺陷[J]. 浙江大学学报(工学版), 2017, 51(1): 184-191.
[5]	孙凌云, 何博伟, 刘征, 杨智渊. 基于语义细胞的语音情感识别[J]. 浙江大学学报(工学版), 2015, 49(6): 1001-1009.
[6]	王鹿军, 吕征宇. 基于LSSVM的电梯交通模式的模糊识别[J]. J4, 2012, 46(7): 1333-1338.
[7]	汤健, 赵立杰, 岳恒, 柴天佑. 基于多源数据特征融合的球磨机负荷软测量[J]. J4, 2010, 44(7): 1406-1413.
[8]	谢波陈岭陈根才陈纯. 普通话语音情感识别的特征选择技术[J]. J4, 2007, 41(11): 1816-1822.

Viewed

Full text

Abstract

Cited

Shared

Discussed