1. Electric Power Research Institute, Southern Power Grid, Guangzhou 510080, China 2. Digital Grid Research Institute, Southern Power Grid, Guangzhou 510080, China 3. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
The principles and defects of full convolutional network (FCN), which was widely utilized in facial landmark localization, were studied to improve the facial landmark localization accuracy. Discuss the side effects introduced by the kernel function in the feature of FCN, that the evaluation criteria were inconsistent during training and testing. Firstly, theoretically analyze the possibility and the universality of this problem, and then design experiments to verify the existence of this problem in actual situation. To solve this problem, a hourglass network structure was proposed for facial landmark localization combining residual features; the cascaded hourglass network structure was given. The experimental results show that the two-stage cascade structure can obtain comparable accuracy compared with the four-stage stack structure, which means that the model parameter quantity and time complexity will be reduced greatly. The average normalization error of the proposed method on the difficult subset of the 300-W database was 6.84%, which is better than the previous best result.
Fig.1Relationship between cross entropy loss and normalized Euclidean loss
Fig.2Diagram of hourglass network based on residual characteristics
Fig.3The structure of stacked hourglass network (SHN) and cascaded hourglass network
数据集
k
m
n
300-W
3 148
689
68
Menpo正脸
6 679
12 006
68
Menpo侧脸
2 300
4 253
39
Tab.1Testing protocol of different datasets
%
方法
常规子集
困难子集
全集
文献[7]
8.22
18.33
10.20
SDM[6]
5.57
15.40
7.50
LBF[9]
4.95
11.98
6.32
CFSS[19]
4.73
9.98
5.76
RAR[12]
4.12
8.35
4.94
DCR[10]
4.07
8.29
4.90
TR-DRN[21]
4.36
7.56
4.99
DAN[20]
4.42
7.57
5.03
CHN
4.22
7.97
4.95
RF-CHN
4.18
7.39
4.81
Tab.2Normalized mean error (NME) on 300-W test dataset with 300-W training data only
%
算法名称
常规子集
困难子集
完全集
DAN-Menpo[20]
4.29
7.05
4.83
CHN
4.11
6.98
4.67
RF-CHN
4.03
6.84
4.58
Tab.3NME on 300-W test dateset using additionaltraining data
Fig.4Cumulative error distribution (CED) curves on common and challenging test sets of 300-W database
Fig.5CED curves on Menpo test dataset
Fig.6Comparison of real value and detection results of face feature point by proposed algorithm
方法
t
e
p/M
SHN
4
7.00
62.63
CHN
2
6.98
30.08
RF-CHN
2
6.84
34.34
Tab.4NME on 300-W test dataset %
[1]
山世光. 人脸识别中若干关键问题的研究[D]. 北京: 中国科学院研究生院, 2004. SHANG Shi-guang. Study on some key issuses in face recognition[D]. Beijing: Institute of Computing Technology Chinese Academy of Sciences, 2004
[2]
刘伟锋. 人脸表情识别研究[D]. 合肥: 中国科学技术大学, 2007. LIU Wei-feng. A study on facial expression recognition[D]. Hefei: University of Science and Technology of China, 2007
[3]
HASSNER T, HAREL S, PAZ E, et al. Effective face frontalization in unconstrained images [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4295-4304.
[4]
COOTES T F, EDWARDS G J, TAYLOR C J. Active appearance models [C] // European conference on computer vision. Freiburg: ECCV, 1998: 484-498.
[5]
COOTES T F, TAYLOR C J, COOPER D H, et al Active shape models-their training and application[J]. Computer vision and image understanding, 1995, 61 (1): 38- 59
doi: 10.1006/cviu.1995.1004
[6]
XIONG X, TORRE F D L. Supervised descent method and its applications to face alignment [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 532-539.
[7]
RAMANAN D. Face detection, pose estimation, and landmark localization in the wild [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2879-2886.
[8]
HOTELLING, HAROLD Analysis of a complex of statistical variables into principal components.[J]. Journal of Educational Psychology, 1933, 24 (6): 417
doi: 10.1037/h0071325
[9]
REN S, CAO X, WEI Y, et al. Face alignment at 3000 FPS via regressing local binary features [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1685-1692.
[10]
LAI H, XIAO S, PAN Y, et al Deep recurrent regression for facial landmark retection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2015, 28 (5): 1144- 1157
[11]
HONARI S, YOSINSKI J, VINCENT P, et al. Recombinator networks: Learning Coarse-to-fine feature aggregation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. LAS VEGAS: IEEE, 2016: 5743-5752.
[12]
XIAO S, FENG J, XING J, et al. Robust facial landmark detection via recurrent attentive-refinement networks [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 57-72.
[13]
BULAT A, TZIMIROPOULOS G. Two-stage convolutional part heatmap regression for the 1st 3D face alignment in the wild (3DFAW) challenge [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 616-624.
[14]
YANG J, LIU Q, ZHANG K. Stacked hourglass network for robust facial landmark localisation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2025-2033.
[15]
LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[16]
SAGONAS C, TZIMIROPOULOS G, ZAFEIRIOU S, et al. 300 faces in-the-wild challenge: the first facial landmark localization challenge [C] // Proceedings of the IEEE International Conference on Computer Vision Workshops. Sydney: ICCV, 2013: 397-403.
[17]
ZAFEIRIOU S, TRIGEORGIS G, CHRYSOS G, et al. The menpo facial landmark localisation challenge: a step towards the solution [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2116-2125.
[18]
JIA, YQ, SHELHAMER, et al. Caffe: convolutional architecture for fast feature embedding [J]. 2014: 675-678.
[19]
ZHU S, LI C, CHEN C L, et al. Face alignment by coarse-to-fine shape searching [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4998-5006.
[20]
KOWALSKI M, NARUNIEC J, TRZCINSKI T. Deep alignment network: a convolutional neural network for robust face alignment [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2034-2043.
[21]
LV J, SHAO X, XING J, et al. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3691-3700.
[22]
AMIR Z, TADAS B, LOUISPHILIPPE M. Convolutional experts constrained local model for facial landmark detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017, pp. 2051-2059.