Facial landmark localization based on cascaded hourglass network with residual features

doi:10.3785/j.issn.1008-973X.2019.12.014

Journal of ZheJiang University (Engineering Science)

2019, Vol. 53

Issue (12): 2365-2371 DOI: 10.3785/j.issn.1008-973X.2019.12.014

Computer Science and Artificial Intelligence

Facial landmark localization based on cascaded hourglass network with residual features

Ai-dong XU1(

),Wen-qi HUANG2,Zhe MING2,Wei-liang CHEN3,Roland HU3,*(

),Hang YANG2

1. Electric Power Research Institute, Southern Power Grid, Guangzhou 510080, China
2. Digital Grid Research Institute, Southern Power Grid, Guangzhou 510080, China
3. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China

Download:

HTML

PDF(1010KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

The principles and defects of full convolutional network (FCN), which was widely utilized in facial landmark localization, were studied to improve the facial landmark localization accuracy. Discuss the side effects introduced by the kernel function in the feature of FCN, that the evaluation criteria were inconsistent during training and testing. Firstly, theoretically analyze the possibility and the universality of this problem, and then design experiments to verify the existence of this problem in actual situation. To solve this problem, a hourglass network structure was proposed for facial landmark localization combining residual features; the cascaded hourglass network structure was given. The experimental results show that the two-stage cascade structure can obtain comparable accuracy compared with the four-stage stack structure, which means that the model parameter quantity and time complexity will be reduced greatly. The average normalization error of the proposed method on the difficult subset of the 300-W database was 6.84%, which is better than the previous best result.

Key words： facial landmark localization fully convolutional network (FCN) residual feature cascaded structure

Received: 05 November 2018 Published: 17 December 2019

CLC:

TP 391.4

Corresponding Authors: Roland HU E-mail: xuad@csg.cn;haoji_hu@zju.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Ai-dong XU
	Wen-qi HUANG
	Zhe MING
	Wei-liang CHEN
	Roland HU
	Hang YANG

Cite this article:

Ai-dong XU,Wen-qi HUANG,Zhe MING,Wei-liang CHEN,Roland HU,Hang YANG. Facial landmark localization based on cascaded hourglass network with residual features. Journal of ZheJiang University (Engineering Science), 2019, 53(12): 2365-2371.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2019.12.014 OR http://www.zjujournals.com/eng/Y2019/V53/I12/2365

基于级联网络和残差特征的人脸特征点定位

为进一步提高人脸特征点定位精度，探究当前广泛用于人脸关键点定位的全卷积神经网络（FCN）架构的原理和缺陷，讨论FCN核函数在特征点定位中引入的副作用，即训练和测试时评判准则不一致的问题. 理论分析该问题存在的可能性和普遍性，设计实验验证在实际场景下此问题存在的广泛性. 提出结合残差特征的沙漏网络结构并将其应用于人脸特征点检测；提出多级沙漏网络的级联结构，并将其与经典的栈式沙漏网络进行对比分析. 实验结果表明：二级级联结构获得了与四级栈式结构相当的特征点定位精度，大幅降低了模型参数量和时间复杂度. 所提方法在300-W数据库的困难子集上的平均归一化误差为6.84%，优于已有最好方法.

关键词： 人脸特征点检测, 全卷积神经网络（FCN）, 残差特征, 级联结构

Fig.1 Relationship between cross entropy loss and normalized Euclidean loss

Fig.2 Diagram of hourglass network based on residual characteristics

Fig.3 The structure of stacked hourglass network (SHN) and cascaded hourglass network

Tab.1 Testing protocol of different datasets

Tab.2 Normalized mean error (NME) on 300-W test dataset with 300-W training data only

Tab.3 NME on 300-W test dateset using additionaltraining data

Fig.4 Cumulative error distribution (CED) curves on common and challenging test sets of 300-W database

Fig.5 CED curves on Menpo test dataset

Fig.6 Comparison of real value and detection results of face feature point by proposed algorithm

Tab.4 NME on 300-W test dataset %


[1]	山世光. 人脸识别中若干关键问题的研究[D]. 北京: 中国科学院研究生院, 2004. SHANG Shi-guang. Study on some key issuses in face recognition[D]. Beijing: Institute of Computing Technology Chinese Academy of Sciences, 2004

[2]	刘伟锋. 人脸表情识别研究[D]. 合肥: 中国科学技术大学, 2007. LIU Wei-feng. A study on facial expression recognition[D]. Hefei: University of Science and Technology of China, 2007

[3]	HASSNER T, HAREL S, PAZ E, et al. Effective face frontalization in unconstrained images [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4295-4304.

[4]	COOTES T F, EDWARDS G J, TAYLOR C J. Active appearance models [C] // European conference on computer vision. Freiburg: ECCV, 1998: 484-498.

[5]	COOTES T F, TAYLOR C J, COOPER D H, et al Active shape models-their training and application[J]. Computer vision and image understanding, 1995, 61 (1): 38- 59 doi: 10.1006/cviu.1995.1004

[6]	XIONG X, TORRE F D L. Supervised descent method and its applications to face alignment [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 532-539.

[7]	RAMANAN D. Face detection, pose estimation, and landmark localization in the wild [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2879-2886.

[8]	HOTELLING, HAROLD Analysis of a complex of statistical variables into principal components.[J]. Journal of Educational Psychology, 1933, 24 (6): 417 doi: 10.1037/h0071325

[9]	REN S, CAO X, WEI Y, et al. Face alignment at 3000 FPS via regressing local binary features [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1685-1692.

[10]	LAI H, XIAO S, PAN Y, et al Deep recurrent regression for facial landmark retection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2015, 28 (5): 1144- 1157

[11]	HONARI S, YOSINSKI J, VINCENT P, et al. Recombinator networks: Learning Coarse-to-fine feature aggregation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. LAS VEGAS: IEEE, 2016: 5743-5752.

[12]	XIAO S, FENG J, XING J, et al. Robust facial landmark detection via recurrent attentive-refinement networks [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 57-72.

[13]	BULAT A, TZIMIROPOULOS G. Two-stage convolutional part heatmap regression for the 1st 3D face alignment in the wild (3DFAW) challenge [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 616-624.

[14]	YANG J, LIU Q, ZHANG K. Stacked hourglass network for robust facial landmark localisation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2025-2033.

[15]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.

[16]	SAGONAS C, TZIMIROPOULOS G, ZAFEIRIOU S, et al. 300 faces in-the-wild challenge: the first facial landmark localization challenge [C] // Proceedings of the IEEE International Conference on Computer Vision Workshops. Sydney: ICCV, 2013: 397-403.

[17]	ZAFEIRIOU S, TRIGEORGIS G, CHRYSOS G, et al. The menpo facial landmark localisation challenge: a step towards the solution [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2116-2125.

[18]	JIA, YQ, SHELHAMER, et al. Caffe: convolutional architecture for fast feature embedding [J]. 2014: 675-678.

[19]	ZHU S, LI C, CHEN C L, et al. Face alignment by coarse-to-fine shape searching [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4998-5006.

[20]	KOWALSKI M, NARUNIEC J, TRZCINSKI T. Deep alignment network: a convolutional neural network for robust face alignment [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2034-2043.

[21]	LV J, SHAO X, XING J, et al. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3691-3700.

[22]	AMIR Z, TADAS B, LOUISPHILIPPE M. Convolutional experts constrained local model for facial landmark detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017, pp. 2051-2059.

[1]	Xiang-hao CHENG,Fei-peng DA,Liang WANG. Feature fusion based constrained local model for three-dimensional facial landmark localization[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(4): 770-776.

Viewed

Full text

Abstract

Cited

Shared

Discussed