基于级联网络和残差特征的人脸特征点定位

doi:10.3785/j.issn.1008-973X.2019.12.014

浙江大学学报(工学版)

2019, Vol. 53

Issue (12): 2365-2371 DOI: 10.3785/j.issn.1008-973X.2019.12.014

计算机科学与人工智能

基于级联网络和残差特征的人脸特征点定位

许爱东1(

),黄文琦2,明哲2,陈伟亮3,胡浩基3,*(

),杨航2

1. 南方电网科学研究院，广东广州，510080
2. 南方电网数字电网研究院，广东广州，510080
3. 浙江大学信息与电子工程学院，浙江杭州，310027

Facial landmark localization based on cascaded hourglass network with residual features

Ai-dong XU1(

),Wen-qi HUANG2,Zhe MING2,Wei-liang CHEN3,Roland HU3,*(

),Hang YANG2

1. Electric Power Research Institute, Southern Power Grid, Guangzhou 510080, China
2. Digital Grid Research Institute, Southern Power Grid, Guangzhou 510080, China
3. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China

全文: PDF(1010 KB) HTML

摘要：

为进一步提高人脸特征点定位精度，探究当前广泛用于人脸关键点定位的全卷积神经网络（FCN）架构的原理和缺陷，讨论FCN核函数在特征点定位中引入的副作用，即训练和测试时评判准则不一致的问题. 理论分析该问题存在的可能性和普遍性，设计实验验证在实际场景下此问题存在的广泛性. 提出结合残差特征的沙漏网络结构并将其应用于人脸特征点检测；提出多级沙漏网络的级联结构，并将其与经典的栈式沙漏网络进行对比分析. 实验结果表明：二级级联结构获得了与四级栈式结构相当的特征点定位精度，大幅降低了模型参数量和时间复杂度. 所提方法在300-W数据库的困难子集上的平均归一化误差为6.84%，优于已有最好方法.

关键词： 人脸特征点检测; 全卷积神经网络（FCN）; 残差特征; 级联结构

Abstract:

The principles and defects of full convolutional network (FCN), which was widely utilized in facial landmark localization, were studied to improve the facial landmark localization accuracy. Discuss the side effects introduced by the kernel function in the feature of FCN, that the evaluation criteria were inconsistent during training and testing. Firstly, theoretically analyze the possibility and the universality of this problem, and then design experiments to verify the existence of this problem in actual situation. To solve this problem, a hourglass network structure was proposed for facial landmark localization combining residual features; the cascaded hourglass network structure was given. The experimental results show that the two-stage cascade structure can obtain comparable accuracy compared with the four-stage stack structure, which means that the model parameter quantity and time complexity will be reduced greatly. The average normalization error of the proposed method on the difficult subset of the 300-W database was 6.84%, which is better than the previous best result.

Key words: facial landmark localization fully convolutional network (FCN) residual feature cascaded structure

收稿日期: 2018-11-05 出版日期: 2019-12-17

CLC:

TP 391.4

基金资助: 中国南方电网有限责任公司科技资助项目（ZBKJXM20170086）

通讯作者: 胡浩基 E-mail: xuad@csg.cn;haoji_hu@zju.edu.cn

作者简介: 许爱东（1977—），男，教授级高级工程师，从事电网信息应用技术研究. orcid.org/0000-0003-2091-817X. E-mail： xuad@csg.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	许爱东
	黄文琦
	明哲
	陈伟亮
	胡浩基
	杨航

引用本文:

许爱东,黄文琦,明哲,陈伟亮,胡浩基,杨航. 基于级联网络和残差特征的人脸特征点定位[J]. 浙江大学学报(工学版), 2019, 53(12): 2365-2371.

Ai-dong XU,Wen-qi HUANG,Zhe MING,Wei-liang CHEN,Roland HU,Hang YANG. Facial landmark localization based on cascaded hourglass network with residual features. Journal of ZheJiang University (Engineering Science), 2019, 53(12): 2365-2371.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2019.12.014 或 http://www.zjujournals.com/eng/CN/Y2019/V53/I12/2365

图 1 交叉熵损失与归一化欧氏损失之间的关系曲线

图 2 基于残差特征的沙漏网络示意图

图 3 栈式沙漏网络（SHN）与级联沙漏网络

表 1 不同数据集的评测标准

表 2 在300-W测试集上的归一化平均误差（NME）（只使用300-W训练集）

表 3 采用额外训练数据时在300-W测试集上的NME

图 4 在300-W的常规子集和困难子集上的累积误差分布（CED）曲线

图 5 在Menpo测试集上的CED曲线

图 6 所提算法人脸特征点检测结果与真实值的对比

表 4 在300-W测试集上的NME

1	山世光. 人脸识别中若干关键问题的研究[D]. 北京: 中国科学院研究生院, 2004. SHANG Shi-guang. Study on some key issuses in face recognition[D]. Beijing: Institute of Computing Technology Chinese Academy of Sciences, 2004
2	刘伟锋. 人脸表情识别研究[D]. 合肥: 中国科学技术大学, 2007. LIU Wei-feng. A study on facial expression recognition[D]. Hefei: University of Science and Technology of China, 2007
3	HASSNER T, HAREL S, PAZ E, et al. Effective face frontalization in unconstrained images [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4295-4304.
4	COOTES T F, EDWARDS G J, TAYLOR C J. Active appearance models [C] // European conference on computer vision. Freiburg: ECCV, 1998: 484-498.
5	COOTES T F, TAYLOR C J, COOPER D H, et al Active shape models-their training and application[J]. Computer vision and image understanding, 1995, 61 (1): 38- 59 doi: 10.1006/cviu.1995.1004
6	XIONG X, TORRE F D L. Supervised descent method and its applications to face alignment [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 532-539.
7	RAMANAN D. Face detection, pose estimation, and landmark localization in the wild [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2879-2886.
8	HOTELLING, HAROLD Analysis of a complex of statistical variables into principal components.[J]. Journal of Educational Psychology, 1933, 24 (6): 417 doi: 10.1037/h0071325
9	REN S, CAO X, WEI Y, et al. Face alignment at 3000 FPS via regressing local binary features [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1685-1692.
10	LAI H, XIAO S, PAN Y, et al Deep recurrent regression for facial landmark retection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2015, 28 (5): 1144- 1157
11	HONARI S, YOSINSKI J, VINCENT P, et al. Recombinator networks: Learning Coarse-to-fine feature aggregation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. LAS VEGAS: IEEE, 2016: 5743-5752.
12	XIAO S, FENG J, XING J, et al. Robust facial landmark detection via recurrent attentive-refinement networks [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 57-72.
13	BULAT A, TZIMIROPOULOS G. Two-stage convolutional part heatmap regression for the 1st 3D face alignment in the wild (3DFAW) challenge [C] // European Conference on Computer Vision. Amsterdam: ECCV, 2016: 616-624.
14	YANG J, LIU Q, ZHANG K. Stacked hourglass network for robust facial landmark localisation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2025-2033.
15	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
16	SAGONAS C, TZIMIROPOULOS G, ZAFEIRIOU S, et al. 300 faces in-the-wild challenge: the first facial landmark localization challenge [C] // Proceedings of the IEEE International Conference on Computer Vision Workshops. Sydney: ICCV, 2013: 397-403.
17	ZAFEIRIOU S, TRIGEORGIS G, CHRYSOS G, et al. The menpo facial landmark localisation challenge: a step towards the solution [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2116-2125.
18	JIA, YQ, SHELHAMER, et al. Caffe: convolutional architecture for fast feature embedding [J]. 2014: 675-678.
19	ZHU S, LI C, CHEN C L, et al. Face alignment by coarse-to-fine shape searching [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4998-5006.
20	KOWALSKI M, NARUNIEC J, TRZCINSKI T. Deep alignment network: a convolutional neural network for robust face alignment [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 2034-2043.
21	LV J, SHAO X, XING J, et al. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3691-3700.
22	AMIR Z, TADAS B, LOUISPHILIPPE M. Convolutional experts constrained local model for facial landmark detection [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017, pp. 2051-2059.

[1]	李啸晨,苏宏业,邵寒山,谢磊. 基于梯度信息的实时优化与控制集成策略[J]. 浙江大学学报(工学版), 2019, 53(5): 843-851.

Viewed

Full text

Abstract

Cited

Shared

Discussed