Please wait a minute...
Journal of Zhejiang University (Science Edition)  2022, Vol. 49 Issue (2): 141-150    DOI: 10.3785/j.issn.1008-9497.2022.02.002
Intelligent Recognition and Visualization     
FSAGN:An expression recognition method based on independent selection of video key frames
Jintai ZHU1,2(),Jihua YE1(),Feng GUO1,Lu JIANG1,Aiwen JIANG1
1.School of Computer Information Engineering,Jiangxi Normal University,Nanchang 330022,China
2.Department of Information Engineering,Zibo Technician College,Zibo 255030,Shandong Province,China
Download: HTML( 2 )   PDF(3039KB)
Export: BibTeX | EndNote (RIS)      

Abstract  

As there exist a large number of video frames unrelated to facial expressions in the video data set containing facial expressions, a large amount of information unrelated to facial expressions is learned in the training process of the model, which results in a significant decline of the performance. So how to make the model capable of choosing the relevant video key frame autonomously becomes the key problem. At present, most of the existing video expression recognition methods do not yet consider the different effects of key frame and non-key frame on the training effect of the model. In the paper, a face expression recognition model based on attention mechanism and GhostNet(FSAGN) is proposed. The model calculates the weights of different frames by self-attention mechanism and frame selection loss, then selects the key frames of the video sequence autonomously according to the weights. In addition, in order to reduce model parameters and training costs, our approach replaces the traditional feature extraction network with the GhostNet network with fewer training parameters, and combines it with the attention model. Experiments were carried out on the designed network in CK+ and AFEW data sets, and the highest recognition rates were 99.64% and 52.31%, respectively, which reached a competitive classification accuracy. It was suitable for facial expression recognition tasks with long video sequences and uneven distribution of facial expression features in video sequences.



Key wordsfacial expression recognition      attention model      key frame selection      GhostNet     
Received: 21 June 2021      Published: 22 March 2022
CLC:  TP 391.41  
Corresponding Authors: Jihua YE     E-mail: 2545000505@qq.com;yjhwcl@163.com
Cite this article:

Jintai ZHU, Jihua YE, Feng GUO, Lu JIANG, Aiwen JIANG. FSAGN:An expression recognition method based on independent selection of video key frames. Journal of Zhejiang University (Science Edition), 2022, 49(2): 141-150.

URL:

https://www.zjujournals.com/sci/EN/Y2022/V49/I2/141


FSAGN: 一种自主选择关键帧的表情识别方法

由于在包含表情的视频数据集中存在大量与表情特征无关的视频帧,使得模型在训练中学习到大量无关信息,导致识别率大幅下降,因此如何令模型自主地选择视频关键帧成为研究的关键。在已有的视频表情识别方法中,大多没有考虑关键帧和非关键帧对模型训练效果的影响,为此提出了一种基于注意力机制与GhostNet的人脸表情识别(FSAGN)模型。通过自注意力机制与帧选择损失计算不同帧的权重,根据权重自主选择视频序列的关键帧。此外,为减少模型参数、降低模型的训练成本,将传统的特征提取网络替换为训练参数较少的GhostNet网络,并与注意力机制结合,分别在CK+和AFEW数据集中进行了实验,得到的最高识别率分别为99.64%和52.31%,分类正确率具有竞争力,适用于对视频序列较长且在视频序列中表情特征分布不均匀的面部表情识别。


关键词: 面部表情识别,  注意力机制,  关键帧自主选择,  GhostNet 
Fig.1 FSAGN model
Fig.2 GhostNet randomly extracted AFEW data set feature map display
Fig.3 Model algorithm flow
Fig.4 A video clip in the AFEW dataset
算法使用帧数识别率/%
LoMo18全部帧92.00
CNN+Island20其中两帧94.35
RASNet1最后一帧96.28
FAN4全部帧99.69
WMDCNN19最后一帧98.52
DTAGN17最后一帧97.25
XIE等6其中三帧97.83
本文算法全部帧99.64
Table 1 The recognition rate of each method on CK+dataset
算法训练时间识别率/%
CNN-RNN51 h 26 min 48 s45.43
VGG-LSTM51 h 40 min 06 s48.60
HoloNet242 h 05 min 18 s44.57
DenseNet42 h 11 min 30 s51.44
FAN441 h 35 min 28 s51.18
XIE等642 h 53 min 13 s46.03
GRERN539 h 11 min 03 s52.26
本文算法29 h 40 min 50 s52.31
Table 2 The recognition rate of each method on AFEW dataset
帧选择模块

特征提取模块

(GhostNet)

帧间信息融合模块识别率/%
CK+AFEW
×××87.1743.55
××87.1443.45
××99.1151.05
×99.5952.24
×99.5552.18
99.6452.31
Table 3 Effect comparison of three modules
λ1λ2识别率/%
CK+AFEW
0.20.897.4948.25
0.30.798.4949.17
0.40.699.6252.25
0.50.599.4852.17
0.60.498.1750.49
Table 4 The influence of different values of λ1 and λ2 on model recognition rate
λ3识别率/%
CK+AFEW
0.0599.4951.17
0.1099.5451.49
0.1599.6252.29
0.2099.6152.25
0.2599.652.17
Table 5 The recognition rate of the model with different values λ3
λ4识别率/%
CK+AFEW
0.599.6252.21
0.699.6152.26
0.799.6452.31
0.899.5851.98
0.999.5051.05
Table 6 The recognition rate of the model with different values λ4
Fig.5 The influence of parameter γ on model recognition effect
Fig.6 The influence of parameter δ on model recognition effect
Fig.7 Video frame selection weight γ visualization
[1]   HE K M, ZHANG X Y, REN S Q,et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2016: 770-778. doi:10.1109/cvpr.2016.90
doi: 10.1109/cvpr.2016.90
[2]   YAO A B, CAI D Q, HU P,et al. HoloNet: Towards robust emotion recognition in the wild[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction. New York: Association for Computing Machinery,2016: 472-478. DOI:10.1145/2993148.2997639
doi: 10.1145/2993148.2997639
[3]   LIU C H, TANG T H,LYU K,et al. Multi-feature based emotion recognition for video clips[C]//ACM International Conference on Multimodal Interaction. New York: Association for Computing Machinery,2018: 630-634. doi:10.1145/3242969.3264989
doi: 10.1145/3242969.3264989
[4]   MENG D B, PENG X J, WANG K,et al. Frame attention networks for facial expression recognition in videos[C]//2019 IEEE International Conference on Image Processing. Piscataway: IEEE,2019: 3866-3870. doi:10.1109/icip.2019.8803603
doi: 10.1109/icip.2019.8803603
[5]   GAO Q Q, ZENG H X, LI G,et al. Graph reasoning-based emotion recognition network[J]. IEEE Access,2020,9: 6488-6497. DOI:10.1109/ACCESS.2020. 3048693
doi: 10.1109/ACCESS.2020. 3048693
[6]   XIE W C, CHEN W T, SHEN L L,et al. Surrogate network-based sparseness hyper-parameter optimization for deep expression recognition[J]. Pattern Recognition,2020,111: 107701. DOI:10. 1016/j.patcog.2020.107701
doi: 10. 1016/j.patcog.2020.107701
[7]   LUCEY P, COHN J F, KANADE T,et al. The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. Piscataway: IEEE,2010: 94-101. DOI:10.1109/CVPRW.2010.5543262
doi: 10.1109/CVPRW.2010.5543262
[8]   DHALL A, GOECKE R, JOSHI J,et al. Emotion recognition in the wild challenge 2014: Baseline,data and protocol[C]//Proceedings of the 16th International Conference on Multimodal Interaction. Piscataway: IEEE,2014: 461-466. DOI:10.1145/2663204. 2666275
doi: 10.1145/2663204. 2666275
[9]   OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(7): 971-987. DOI:10.1109/TPAMI.2002.1017623
doi: 10.1109/TPAMI.2002.1017623
[10]   黄凯奇,任伟强,谭铁牛. 图像物体分类与检测算法综述[J].计算机学报,2014,37(6): 1225-1240. DOI:10.3724/SP.J.1016.2014.01225
HUANG K Q, REN W Q, TAN T N,et al. A review on image object classification and detection[J]. Chinese Journal of Computers,2014,37(6): 1225-1240. DOI:10.3724/SP.J.1016.2014.01225
doi: 10.3724/SP.J.1016.2014.01225
[11]   HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8): 1735-1780. DOI:10.1162/neco.1997.9.8.1735
doi: 10.1162/neco.1997.9.8.1735
[12]   BARGAL S A, BARSOUM E, FERRER C C,et al. Emotion recognition in the wild from videos using images[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction. New York: Association for Computing Machinery,2016: 433-436. doi:10.1145/2993148.2997627
doi: 10.1145/2993148.2997627
[13]   ZHAO X Y, LIANG X D, LIU L Q,et al. Peak-piloted deep network for facial expression recognition[C]// European Conference on Computer Vision. Cham: Springer,2016: 425-442. DOI:10.1007/978-3-319-46475-6_27
doi: 10.1007/978-3-319-46475-6_27
[14]   HAN K, WANG Y H, TIAN Q,et al. GhostNet: More features from cheap operations[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2020:1577-1586. doi:10.1109/cvpr42600.2020.00165
doi: 10.1109/cvpr42600.2020.00165
[15]   WEI H, HUANG Y Y, ZHANG F,et al. Noise-tolerant paradigm for training face recognition CNNs[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2019:11887-11896. doi:10.1109/cvpr.2019.01216
doi: 10.1109/cvpr.2019.01216
[16]   LIN T Y, GOYAL P, GIRSHICK R,et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE,2017:2999-3007. doi:10.1109/iccv.2017.324
doi: 10.1109/iccv.2017.324
[17]   JUNG H, LEE S,YIM J,et al. Joint fine-tuning in deep neural networks for facial expression recognition[C]//IEEE International Conference on Computer Vision. Piscataway: IEEE,2015: 2983-2991. DOI:10.1109/ICCV.2015.341
doi: 10.1109/ICCV.2015.341
[18]   SIKKA K, SHARMA G, BARTLETT M. LoMo: Latent ordinal model for facial analysis in videos[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2016:5580-5589. doi:10.1109/cvpr.2016.602
doi: 10.1109/cvpr.2016.602
[19]   ZHANG H P, HUANG B, TIAN G H. Facial expression recognition based on deep convolution long short-term memory networks of double-channel weighted mixture[J]. Pattern Recognition Letters,2020,131: 128-134. DOI:10.1016/j.patrec.2019. 12.013
doi: 10.1016/j.patrec.2019. 12.013
[1] Yuhua FANG,Feng YE. MFDC-Net: A breast cancer pathological image classification algorithm incorporating multi-scale feature fusion and attention mechanism[J]. Journal of Zhejiang University (Science Edition), 2023, 50(4): 455-464.
[2] Ruiqi YU,Yuhua LIU,Xilong SHEN,Ruyu ZHAI,Xiang ZHANG,Zhiguang ZHOU. Representation learning driven multiple graph sampling[J]. Journal of Zhejiang University (Science Edition), 2022, 49(3): 271-279.
[3] Ying ZHONG,Song WANG,Hao WU,Zepeng CHENG,Xuejun LI. SEMMA-Based visual exploration of cyber security event[J]. Journal of Zhejiang University (Science Edition), 2022, 49(2): 131-140.
[4] Qiang ZHU,Chaoyi WANG,Jiqing ZHANG,Baocai YIN,Xiaopeng WEI,Xin YANG. UAV target tracking algorithm based on event camera[J]. Journal of Zhejiang University (Science Edition), 2022, 49(1): 10-18.
[5] Meng YANG,Shu DING,Yuntao MA,Jiayi XIE,Ruifeng DUAN. Dynamic simulation method of wheat rust based on texture feature[J]. Journal of Zhejiang University (Science Edition), 2022, 49(1): 1-9.
[6] YU Peng, LIU Lan, CAI Yun, HE Yu, ZHANG Songhai. Home fitness monitoring system based on monocular camera[J]. Journal of Zhejiang University (Science Edition), 2021, 48(5): 521-530.
[7] FU Rujia, XIAN Chuhua, LI Guiqing, WAN Juanjie, CAO Cheng, YANG Cunyi, GAO Yuefang. Rapid 3D reconstruction of bean plant for accurate phenotype identification[J]. Journal of Zhejiang University (Science Edition), 2021, 48(5): 531-539.
[8] GUI Zhiqiang, YAO Yuyou, ZHANG Gaofeng, XU Benzhu, ZHENG Liping. An efficient computation method of 3D-power diagram[J]. Journal of Zhejiang University (Science Edition), 2021, 48(4): 410-417.
[9] XU Min, WANG Ke, DAI Haoran, LUO Xiaobo, YU Weilun, TAO Yubo, LIN Hai. Visual analysis of cohorts and treatments of breast cancer based on electronic health records[J]. Journal of Zhejiang University (Science Edition), 2021, 48(4): 391-401.
[10] ZOU Beiji, YANG Wenjun, LIU Shu, JIANG Lingzi. A three-stage text recognition framework for natural scene images[J]. Journal of Zhejiang University (Science Edition), 2021, 48(1): 1-8.
[11] CHEN Yuanqiong, ZOU Beiji, ZHANG Meihua, LIAO Wangmin, HUANG Jiaer, ZHU Chengzhang. A review on deep learning interpretability in medical image processing[J]. Journal of Zhejiang University (Science Edition), 2021, 48(1): 18-29.
[12] DENG Huijun. Ranking-supported interactive data classification method and its application[J]. Journal of Zhejiang University (Science Edition), 2021, 48(1): 9-17.
[13] LI Huabiao, HOU Xiaogang, WANG Tingting, ZHAO Haiying. An unified generation scheme of traditional patterns based on rule learning[J]. Journal of Zhejiang University (Science Edition), 2020, 47(6): 669-676.
[14] TAN Jieqing, CAO Ningning. A new Midedge scheme of quadrilateral mesh[J]. Journal of Zhejiang University (Science Edition), 2019, 46(2): 154-163.