Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2019, Vol. 53 Issue (12): 2365-2371    DOI: 10.3785/j.issn.1008-973X.2019.12.014
Computer Science and Artificial Intelligence     
Facial landmark localization based on cascaded hourglass network with residual features
Ai-dong XU1(),Wen-qi HUANG2,Zhe MING2,Wei-liang CHEN3,Roland HU3,*(),Hang YANG2
1. Electric Power Research Institute, Southern Power Grid, Guangzhou 510080, China
2. Digital Grid Research Institute, Southern Power Grid, Guangzhou 510080, China
3. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Download: HTML     PDF(1010KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The principles and defects of full convolutional network (FCN), which was widely utilized in facial landmark localization, were studied to improve the facial landmark localization accuracy. Discuss the side effects introduced by the kernel function in the feature of FCN, that the evaluation criteria were inconsistent during training and testing. Firstly, theoretically analyze the possibility and the universality of this problem, and then design experiments to verify the existence of this problem in actual situation. To solve this problem, a hourglass network structure was proposed for facial landmark localization combining residual features; the cascaded hourglass network structure was given. The experimental results show that the two-stage cascade structure can obtain comparable accuracy compared with the four-stage stack structure, which means that the model parameter quantity and time complexity will be reduced greatly. The average normalization error of the proposed method on the difficult subset of the 300-W database was 6.84%, which is better than the previous best result.



Key wordsfacial landmark localization      fully convolutional network (FCN)      residual feature      cascaded structure     
Received: 05 November 2018      Published: 17 December 2019
CLC:  TP 391.4  
Corresponding Authors: Roland HU     E-mail: xuad@csg.cn;haoji_hu@zju.edu.cn
Cite this article:

Ai-dong XU,Wen-qi HUANG,Zhe MING,Wei-liang CHEN,Roland HU,Hang YANG. Facial landmark localization based on cascaded hourglass network with residual features. Journal of ZheJiang University (Engineering Science), 2019, 53(12): 2365-2371.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2019.12.014     OR     http://www.zjujournals.com/eng/Y2019/V53/I12/2365


基于级联网络和残差特征的人脸特征点定位

为进一步提高人脸特征点定位精度,探究当前广泛用于人脸关键点定位的全卷积神经网络(FCN)架构的原理和缺陷,讨论FCN核函数在特征点定位中引入的副作用,即训练和测试时评判准则不一致的问题. 理论分析该问题存在的可能性和普遍性,设计实验验证在实际场景下此问题存在的广泛性. 提出结合残差特征的沙漏网络结构并将其应用于人脸特征点检测;提出多级沙漏网络的级联结构,并将其与经典的栈式沙漏网络进行对比分析. 实验结果表明:二级级联结构获得了与四级栈式结构相当的特征点定位精度,大幅降低了模型参数量和时间复杂度. 所提方法在300-W数据库的困难子集上的平均归一化误差为6.84%,优于已有最好方法.


关键词: 人脸特征点检测,  全卷积神经网络(FCN),  残差特征,  级联结构 
Fig.1 Relationship between cross entropy loss and normalized Euclidean loss
Fig.2 Diagram of hourglass network based on residual characteristics
Fig.3 The structure of stacked hourglass network (SHN) and cascaded hourglass network
数据集 k m n
300-W 3 148 689 68
Menpo正脸 6 679 12 006 68
Menpo侧脸 2 300 4 253 39
Tab.1 Testing protocol of different datasets
%
方法 常规子集 困难子集 全集
文献[7] 8.22 18.33 10.20
SDM[6] 5.57 15.40 7.50
LBF[9] 4.95 11.98 6.32
CFSS[19] 4.73 9.98 5.76
RAR[12] 4.12 8.35 4.94
DCR[10] 4.07 8.29 4.90
TR-DRN[21] 4.36 7.56 4.99
DAN[20] 4.42 7.57 5.03
CHN 4.22 7.97 4.95
RF-CHN 4.18 7.39 4.81
Tab.2 Normalized mean error (NME) on 300-W test dataset with 300-W training data only
%
算法名称 常规子集 困难子集 完全集
DAN-Menpo[20] 4.29 7.05 4.83
CHN 4.11 6.98 4.67
RF-CHN 4.03 6.84 4.58
Tab.3 NME on 300-W test dateset using additionaltraining data
Fig.4 Cumulative error distribution (CED) curves on common and challenging test sets of 300-W database
Fig.5 CED curves on Menpo test dataset
Fig.6 Comparison of real value and detection results of face feature point by proposed algorithm
方法 t e p/M
SHN 4 7.00 62.63
CHN 2 6.98 30.08
RF-CHN 2 6.84 34.34
Tab.4 NME on 300-W test dataset %
[1]   山世光. 人脸识别中若干关键问题的研究[D]. 北京: 中国科学院研究生院, 2004.
SHANG Shi-guang. Study on some key issuses in face recognition[D]. Beijing: Institute of Computing Technology Chinese Academy of Sciences, 2004
[2]   刘伟锋. 人脸表情识别研究[D]. 合肥: 中国科学技术大学, 2007.
LIU Wei-feng. A study on facial expression recognition[D]. Hefei: University of Science and Technology of China, 2007
[3]   HASSNER T, HAREL S, PAZ E, et al. Effective face frontalization in unconstrained images [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4295-4304.
[4]   COOTES T F, EDWARDS G J, TAYLOR C J. Active appearance models [C] // European conference on computer vision. Freiburg: ECCV, 1998: 484-498.
[5]   COOTES T F, TAYLOR C J, COOPER D H, et al Active shape models-their training and application[J]. Computer vision and image understanding, 1995, 61 (1): 38- 59
doi: 10.1006/cviu.1995.1004
[6]   XIONG X, TORRE F D L. Supervised descent method and its applications to face alignment [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 532-539.
[7]   RAMANAN D. Face detection, pose estimation, and landmark localization in the wild [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2879-2886.
[8]   HOTELLING, HAROLD Analysis of a complex of statistical variables into principal components.[J]. Journal of Educational Psychology, 1933, 24 (6): 417
doi: 10.1037/h0071325
[9]   REN S, CAO X, WEI Y, et al. Face alignment at 3000 FPS via regressing local binary features [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1685-1692.
[10]   LAI H, XIAO S, PAN Y, et al Deep recurrent regression for facial landmark retection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2015, 28 (5): 1144- 1157
[11]   HONARI S, YOSINSKI J, VINCENT P, et al. Recombinator networks: Learning Coarse-to-fine feature aggregation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. LAS VEGAS: IEEE, 2016: 5743-5752.
[12]   XIAO S, FENG J, XING J, et al. Robust facial landmark detection via recurrent attentive-refinement networks [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 57-72.
[13]   BULAT A, TZIMIROPOULOS G. Two-stage convolutional part heatmap regression for the 1st 3D face alignment in the wild (3DFAW) challenge [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 616-624.
[14]   YANG J, LIU Q, ZHANG K. Stacked hourglass network for robust facial landmark localisation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2025-2033.
[15]   LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[16]   SAGONAS C, TZIMIROPOULOS G, ZAFEIRIOU S, et al. 300 faces in-the-wild challenge: the first facial landmark localization challenge [C] // Proceedings of the IEEE International Conference on Computer Vision Workshops. Sydney: ICCV, 2013: 397-403.
[17]   ZAFEIRIOU S, TRIGEORGIS G, CHRYSOS G, et al. The menpo facial landmark localisation challenge: a step towards the solution [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2116-2125.
[18]   JIA, YQ, SHELHAMER, et al. Caffe: convolutional architecture for fast feature embedding [J]. 2014: 675-678.
[19]   ZHU S, LI C, CHEN C L, et al. Face alignment by coarse-to-fine shape searching [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4998-5006.
[20]   KOWALSKI M, NARUNIEC J, TRZCINSKI T. Deep alignment network: a convolutional neural network for robust face alignment [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2034-2043.
[21]   LV J, SHAO X, XING J, et al. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3691-3700.
[22]   AMIR Z, TADAS B, LOUISPHILIPPE M. Convolutional experts constrained local model for facial landmark detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017, pp. 2051-2059.
[1] Xiang-hao CHENG,Fei-peng DA,Liang WANG. Feature fusion based constrained local model for three-dimensional facial landmark localization[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(4): 770-776.