Please wait a minute...
浙江大学学报(工学版)  2019, Vol. 53 Issue (10): 2041-2048    DOI: 10.3785/j.issn.1008-973X.2019.10.022
通信技术     
面向情感语音识别的情感维度PAD预测
孙颖(),胡艳香,张雪英*(),段淑斐
太原理工大学 信息与计算机学院,山西 太原 030024
Prediction of emotional dimensions PAD for emotional speech recognition
Ying SUN(),Yan-xiang HU,Xue-ying ZHANG*(),Shu-fei DUAN
College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
 全文: PDF(776 KB)   HTML
摘要:

针对现有的情感特征仅从信号的角度对情感进行分析,不能直观反映情感状态的问题,提出将连续情感维度PAD引入情感识别. 实验样本选用TYUT2.0数据库和柏林语音库中的3种情感(悲伤、愤怒和高兴),提取情感特征(韵律特征、共振峰、MFCC和非线性特征). 为了获取客观、精确的PAD维度,利用灰色关联分析(GRA)选取影响P、A、D的主要特征,通过主成分分析(PCA)提取主要特征的主成分,将主成分作为最小二乘支持向量机(LSSVM)的输入预测P、A、D. 分别对情感特征、PAD维度及它们的融合,采用支持向量机进行情感识别. 实验结果表明,该预测方法在一定程度上提高了对P、A、D的预测精度,预测值可以有效识别情感,对情感特征在情感识别方面有一定的补充作用.

关键词: 语音情感识别PAD维度最小二乘支持向量机(LSSVM)灰色关联分析(GRA)主成分分析(PCA)    
Abstract:

The continuous emotional dimension PAD (pleasure, arousal, dominance) was proposed to introduce into emotion recognition in view of the fact that the existing emotional characteristics only analyze emotion from the point of view of signal, and can not directly reflect the emotional state. The experimental samples were based on three emotions (sadness, anger and happiness) from the TYUT2.0 database and the Berlin voice library, and the emotional features (prosodic feature, formant, MFCC and nonlinear feature) were extracted. Grey relational analysis (GRA) was used to select the main features that affect P, A and D in order to obtain the objective and accurate PAD dimension values. Then principal component analysis (PCA) was used to extract the principal components of the main features, and was made as the input of least squares support vector machine (LSSVM) to predict the P, A and D. The emotional features, PAD dimensions and their fusion were used separately for emotion recognition by using support vector machine. The experimental results show that the prediction method improves the prediction accuracy of the P, A and D to a certain extent. The predictive values can effectively identify the emotion, which has a certain complement to emotional characteristics in emotion recognition.

Key words: speech emotion recognition    PAD dimensions    least squares support vector machine (LSSVM)    grey relational analysis (GRA)    principal component analysis (PCA)
收稿日期: 2018-08-22 出版日期: 2019-09-30
CLC:  TN 912  
通讯作者: 张雪英     E-mail: tyutsy@163.com;tyzhangxy@163.com
作者简介: 孙颖(1981—),女,讲师,从事情感语音识别、情感计算的研究. orcid.org/0000-0003-3926-062X. E-mail: tyutsy@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
孙颖
胡艳香
张雪英
段淑斐

引用本文:

孙颖,胡艳香,张雪英,段淑斐. 面向情感语音识别的情感维度PAD预测[J]. 浙江大学学报(工学版), 2019, 53(10): 2041-2048.

Ying SUN,Yan-xiang HU,Xue-ying ZHANG,Shu-fei DUAN. Prediction of emotional dimensions PAD for emotional speech recognition. Journal of ZheJiang University (Engineering Science), 2019, 53(10): 2041-2048.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2019.10.022        http://www.zjujournals.com/eng/CN/Y2019/V53/I10/2041

图 1  PAD三维情感模型
图 2  GRA-PCA-LSSVM模型预测P、A、D流程图
图 3  PAD空间情感分布
特征 特征名称
韵律特征 语速;平均过零率;能量及其1阶差分最大值、最小值、均值;基频及其1阶差分最大值、最小值、均值
共振峰 第1、第2、第3共振峰及其1阶差分最大值、最小值、均值、方差
MFCC MFCC前12阶的偏度、峰度、均值、方差、中值
非线性特征 Hurst指数最大值、最小值、均值、中值、方差;最小延迟时间最大值、最小值、均值、中值、方差;关联维数最大值、最小值、均值、中值、方差;Kolmogorov熵最大值、最小值、均值、中值、方差;最大Lyapunov指数均值、中值、方差
表 1  情感语音特征
图 4  基于不同特征维数的PAD预测MAE误差趋势图
维度 GRA处理后特征维数 GRA-PCA处理后特征维数
P 83 57
A 111 69
D 55 43
表 2  GRA-PCA特征维数
维度 模型 TYUT2.0 EMO-DB
r R2 MAE r R2 MAE
P 模型1 0.53 0.28 0.89 0.59 0.33 0.87
P 模型2 0.48 0.22 0.94 0.46 0.18 0.91
P 模型3 0.48 0.23 0.92 0.46 0.19 0.94
P 模型4 0.44 0.20 0.95 0.45 0.16 0.93
A 模型1 0.73 0.53 0.40 0.74 0.52 0.34
A 模型2 0.70 0.49 0.43 0.68 0.38 0.40
A 模型3 0.69 0.45 0.43 0.69 0.41 0.38
A 模型4 0.68 0.44 0.45 0.67 0.33 0.41
D 模型1 0.69 0.46 0.74 0.96 0.92 0.27
D 模型2 0.63 0.40 0.76 0.96 0.92 0.28
D 模型3 0.59 0.35 0.78 0.96 0.90 0.31
D 模型4 0.59 0.34 0.80 0.96 0.91 0.29
表 3  4类回归模型在2类数据库的预测结果比较
%
情感分类 TYUT2.0 EMO-DB
方案1 方案2 方案3 方案1 方案2 方案3
悲伤 52.94 52.94 76.47 100 100 100
愤怒 73.68 100 84.21 68.42 84.21 73.68
高兴 47.06 47.06 58.82 52.94 52.94 64.71
平均 58.49 67.92 73.58 73.58 79.25 79.25
表 4  PAD维度与FPFMN特征的识别率对比
1 蒋海华, 胡斌 基于PCA和SVM的普通话语音情感识别[J]. 计算机科学, 2015, 42 (11): 270- 273
JIANG Hai-hua, HU Bin Speech emotion recognition in mandarin based on PCA and SVM[J]. Computer Science, 2015, 42 (11): 270- 273
2 谭发曾. 语音情感状态模糊识别研究[D]. 成都: 电子科技大学, 2015.
TAN Fa-zeng. Study of speech motion states fuzzy recognition [D]. Chengdu: University of Electronic Science and Technology of China, 2015.
3 ZBANCIOC M D, FERARU M. Using the Lyapunov exponent from cepstral coefficients for automatic emotion recognition [C] // International Conference and Exposition on Electrical and Power Engineering. Iasi, Romania: IEEE, 2014: 110-113.
4 孙颖, 宋春晓 相空间重构的情感语音特征提取及优化[J]. 西安电子科技大学学报: 自然科学版, 2017, 44 (6): 162- 168
SUN Ying, SONG Chun-xiao Emotional speech feature extraction and optimization of phase space reconstruction[J]. Journal of Xidian University: Natural Science, 2017, 44 (6): 162- 168
5 MEHRABIAN A Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament[J]. Current Psychology, 1996, 14 (4): 261- 292
doi: 10.1007/BF02686918
6 VERMA G K, TIWARY U S Affect representation and recognition in 3D continuous valence–arousal–dominance space[J]. Multimedia Tools and Applications, 2016, 76 (2): 1- 25
7 SUYKENS J A K, VANDEWALLE J Least squares support machine classifiers[J]. Neural Processing Letters, 1999, 9 (3): 293- 300
doi: 10.1023/A:1018628609742
8 SUN W, SUN J Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm[J]. Journal of Environmental Management, 2016, 188: 144- 152
9 CAI Z, XU W, MENG Y, et al Prediction of landslide displacement based on GA-LSSVM with multiple factors[J]. Bulletin of Engineering Geology and the Environment, 2016, 75 (2): 637- 646
doi: 10.1007/s10064-015-0804-z
10 梁宁, 耿立艳, 张占福, 等 基于GRA与SVM-mixed的货运量预测方法[J]. 交通运输系统工程与信息, 2016, 16 (6): 94- 99
LIANG Ning, GENG Li-yan, ZHANG Zhan-fu, et al A prediction method of railway freight volumes using GRA and SVM-mixed[J]. Journal of Transportation Systems Engineering and Information Technology, 2016, 16 (6): 94- 99
doi: 10.3969/j.issn.1009-6744.2016.06.015
11 王沛, 欧阳传湘, 陈宏生, 等 应用PCA和多元非线性回归快速预测储层敏感性[J]. 断块油气田, 2018, 25 (2): 232- 235
WANG Pei, OUYANG Chuan-xiang, CHEN Hong-sheng, et al Application of PCA and multiple nonlinear regression to rapid prediction of reservoir sensitivity[J]. Fault-Block Oil and Gas Field, 2018, 25 (2): 232- 235
12 王丽. V-A空间连续维度情感预测方法研究[D]. 镇江: 江苏大学, 2015.
WANG Li. Research on dimensional and continuous emotion prediction in valence-arousal space [D]. Zhenjiang: Jiangsu University, 2015.
13 汪建新, 陈肖洁 LSSVM的特征选择算法在烧结过程的应用[J]. 机械设计与制造, 2018, (3): 75- 77
WANG Jian-xin, CHEN Xiao-jie Application in sintering process modeling using the feature selection algorithm of least squares support vector machine[J]. Machinery Design and Manufacture, 2018, (3): 75- 77
doi: 10.3969/j.issn.1001-3997.2018.03.023
14 张雪英, 张婷, 孙颖, 等 情感语音数据库优化及PAD情感模型量化标注[J]. 太原理工大学学报, 2017, 48 (3): 469- 474
ZHANG Xue-ying, ZHANG Ting, SUN Ying, et al Emotional speech database optimization and quantitative annotation based on PAD emotion model[J]. Journal of Taiyuan University of Technology, 2017, 48 (3): 469- 474
15 BURKHARDT F, PAESCHKE A, ROLFES M, et al. A database of German emotional speech [C] // European Conference on Speech Communication and Technology. Lisbon, Portugal: DBLP, 2005: 1517-1520.
16 姚慧, 孙颖, 张雪英 情感语音的非线性动力学特征[J]. 西安电子科技大学学报: 自然科学版, 2016, 43 (5): 167- 172
YAO Hui, SUN Ying, ZHANG Xue-ying Research on nonlinear dynamics features of emotional speech[J]. Journal of Xidian University: Natural Science, 2016, 43 (5): 167- 172
doi: 10.3969/j.issn.1001-2400.2016.05.029
17 李幼军, 钟宁, 黄佳进, 等 基于高斯核函数支持向量机的脑电信号时频特征情感多类识别[J]. 北京工业大学学报, 2018, 44 (2): 234- 243
LI You-jun, ZHONG Ning, HUANG Jia-jin, et al Human emotion multi-classification recognition based on the EEG time and frequency features by using a Gaussian kernel function SVM[J]. Journal of Beijing University of Technology, 2018, 44 (2): 234- 243
doi: 10.11936/bjutxb2017040018
[1] 李研彪,郑航,徐梦茹,罗怡沁,孙鹏. 5-PSS/UPU并联机构的多目标性能参数优化[J]. 浙江大学学报(工学版), 2019, 53(4): 654-663.
[2] 吴平, 陈亮, 周伟, 郭玲玲. 基于主成分分析和噪声估计的在线子空间辨识[J]. 浙江大学学报(工学版), 2018, 52(9): 1694-1701.
[3] 孟濬, 邓晓雨, 虞捷舟. 基于变量聚类的BP神经网络术后生存期预测模型[J]. 浙江大学学报(工学版), 2018, 52(12): 2365-2371.
[4] 谢罗峰, 徐慧宁, 黄沁元, 赵越, 殷国富. 应用双树复小波包和NCA-LSSVM检测磁瓦内部缺陷[J]. 浙江大学学报(工学版), 2017, 51(1): 184-191.
[5] 孙凌云, 何博伟, 刘征, 杨智渊. 基于语义细胞的语音情感识别[J]. 浙江大学学报(工学版), 2015, 49(6): 1001-1009.
[6] 王鹿军, 吕征宇. 基于LSSVM的电梯交通模式的模糊识别[J]. J4, 2012, 46(7): 1333-1338.
[7] 汤健, 赵立杰, 岳恒, 柴天佑. 基于多源数据特征融合的球磨机负荷软测量[J]. J4, 2010, 44(7): 1406-1413.
[8] 谢波 陈岭 陈根才 陈纯. 普通话语音情感识别的特征选择技术[J]. J4, 2007, 41(11): 1816-1822.