基于ShuffleNetv2-YOLOv3模型的静态手势实时识别方法

doi:10.3785/j.issn.1008-973X.2021.10.003

浙江大学学报(工学版)

2021, Vol. 55

Issue (10): 1815-1824 DOI: 10.3785/j.issn.1008-973X.2021.10.003

计算机技术

基于ShuffleNetv2-YOLOv3模型的静态手势实时识别方法

辛文斌1(

),郝惠敏1,卜明龙1,2,兰媛1(

),黄家海1,*(

),熊晓燕1

1. 太原理工大学机械与运载工程学院，山西太原 030024
2. 哈尔滨电机厂有限责任公司，黑龙江哈尔滨 150040

Static gesture real-time recognition method based on ShuffleNetv2-YOLOv3 model

Wen-bin XIN1(

),Hui-min HAO1,Ming-long BU1,2,Yuan LAN1(

),Jia-hai HUANG1,*(

),Xiao-yan XIONG1

1. School of Mechanical and Transportation Engineering, Taiyuan University of Technology, Taiyuan 030024, China
2. Harbin Electric Machinery Limited Company, Harbin 150040, China

全文: PDF(923 KB) HTML

摘要：

针对移动端平台下计算资源有限、存储空间小的特点，提出高效的ShuffleNetv2及YOLOv3集成网络静态手势实时识别方法，以减小模型对硬件的计算能力需求. 通过将轻量化网络ShuffleNetv2代替Darknet-53作为主干网络，减小模型的计算复杂度. 引入CBAM注意力机制模块，加强网络对空间和通道的关注度. 采用K-means聚类算法，重新生成Anchors的长宽比和数量，使重新生成的Anchors尺寸对目标进行精确定位来提高模型的检测精度. 实验结果表明，提出算法在手势识别上的平均识别准确率为99.2%，识别速度为44帧/s，单张416×416图片在GPU上的推理时间为15 ms，CPU上的推理时间为58 ms，模型所占内存为15.1 MB. 该方法具有识别精度高、识别速度快、内存占用率低等优点，有利于模型在移动终端上部署.

关键词： YOLOv3; 轻量化ShuffleNetv2网络; CBAM注意力机制; 手势识别; 移动终端

Abstract:

An efficient ShuffleNetv2 and YOLOv3 integrated network static gesture real-time recognition method was proposed to reduce the computing power requirements of the model on the hardware aiming at the characteristics of limited computing resources and small storage space under the mobile terminal platform. The computational complexity of the model was reduced by replacing Darknet-53 with the lightweight network ShuffleNetv2 as the backbone network. The CBAM attention mechanism module was introduced to strengthen the network’s attention to space and channels. The K-means clustering algorithm was used to regenerate the aspect ratio and number of Anchors, so that the regenerated Anchors size can accurately locate the target to improve the detection accuracy of the model. The experimental results showed that the average recognition accuracy of the proposed algorithm on gesture recognition was 99.2%, and the recognition speed was 44 frames/s. The inference time of a single 416×416 picture on the GPU was 15 ms, and the inference time on the CPU was 58 ms. The memory occupied by the model was 15.1 MB. The method has the advantages of high recognition accuracy, fast recognition speed, and low memory occupancy rate, which is conducive to the deployment of models on mobile terminals.

Key words: YOLOv3 lightweight ShuffleNetv2 network CBAM attention mechanism gesture recognition mobile terminal

收稿日期: 2020-09-24 出版日期: 2021-10-27

CLC:

TP 391

基金资助: 国家重点研发计划资助项目(2018YFB1308700)；2020年山西省关键核心技术和共性技术研发攻关专项项目(2020XXX009，2020XXX001)

通讯作者: 黄家海 E-mail: 2878095493@qq.com;lanyuan@tyut.edu.cn;huangjiahai@tyut.edu.cn

作者简介: 辛文斌(1995—)，男，硕士生，从事计算机视觉研究. orcid.org/0000-0002-6891-8235. E-mail： 2878095493@qq.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	辛文斌
	郝惠敏
	卜明龙
	兰媛
	黄家海
	熊晓燕

引用本文:

辛文斌,郝惠敏,卜明龙,兰媛,黄家海,熊晓燕. 基于ShuffleNetv2-YOLOv3模型的静态手势实时识别方法[J]. 浙江大学学报(工学版), 2021, 55(10): 1815-1824.

Wen-bin XIN,Hui-min HAO,Ming-long BU,Yuan LAN,Jia-hai HUANG,Xiao-yan XIONG. Static gesture real-time recognition method based on ShuffleNetv2-YOLOv3 model. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1815-1824.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2021.10.003 或 https://www.zjujournals.com/eng/CN/Y2021/V55/I10/1815

图 1 YOLOv3模型的整体结构

图 2 ShuffleNetv2-YOLOv3模型的整体结构

表 1 ShuffleNetv2的网络结构

图 3 手势识别流程图

图 4 自制手势数据集样例

图 5 Microsoft Kinect and Leap Motion数据集样例

图 6 Creative Senz3D数据集样例

表 2 算法训练硬件环境配置

表 3 ShuffleNetv2不同的输出通道数之间的测试结果

表 4 ShuffleNetv2+CBAM不同的输出通道数的测试结果

表 5 不同Anchors对模型精度的测试结果

表 6 使用CBAM注意力机制模块的测试结果

表 7 使用K-means聚类Anchors的测试结果

表 8 不同主干网络的测试结果

表 9 不同数据集的测试结果

表 10 改进模型在硬件性能上的测试结果

图 7 单目标手势的识别结果

图 8 多目标手势识别结果

1	JIANG D, ZHENG Z, LI G, et al Gesture recognition based on binocular vision[J]. Cluster Computing, 2018, 22 (3): 1- 11
2	AL-HELALI B M, MAHMOUD S A Arabic online handwriting recognition (AOHR): a survey[J]. ACM Computing Surveys, 2017, 50 (3): 1- 35
3	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [EB/OL]. [2020-08-15]. https://doi.org/10.1109/cvpr.2014.81.
4	HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37 (9): 1904- 1916
5	REN S, HE K, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149
6	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// Proceedings of European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
7	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C.: IEEE, 2016: 779-788.
8	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications [EB/OL]. [2020-08-15]. https://arxiv.org/abs/1704.04861.
9	ZHANG X, ZHOU X, LIN M, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices [EB/OL]. [2020-08-15]. https://arxiv.org/abs/1707.01083v2.
10	REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. [2019-02-25]. https://arxiv.org/abs/1804.02767v1.
11	MA N, ZHANG X, ZHENG H T, et al. ShuffleNetV2: practical guidelines for efficient CNN architecture design [C]// European Conference on Computer Vision. Cham: Springer, 2018: 116-131.
12	SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2018: 4510-4520.
13	REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// IEEE Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2017: 6517-6525.
14	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2017: 1-9.
15	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018.
16	MARIN G, DOMINIO F, ZANUTTIGH P. Hand gesture recognition with leap motion and kinect devices [C]// IEEE International Conference on Image Processing. Paris: IEEE, 2014.
17	MARIN G, DOMINIO F, ZANUTTIGH P Hand gesture recognition with jointly calibrated leap motion and depth sensor[J]. Multimedia Tools and Applications, 2015, 75 (22): 1- 10
18	MEMO A, MINTO L, ZANUTTIGH P. Exploiting silhouette descriptors and synthetic data for hand gesture recognition [EB/OL]. [2020-08-15]. https://dx.doi.org/10.2312/stag.20151288.
19	MEMO A, ZANUTTIGH P Head-mounted gesture controlled interface for human-computer interaction[J]. Multimedia Tools and Applications, 2017, 77 (6): 1- 13
20	PINTO R F, BORGES C D B, ALMEIDA A M A, et al. Static hand gesture recognition based on convolutional neural networks [EB/OL]. [2020-08-15]. https://doi.org/10.1155/2019/4167890.
21	CHEOK M J, OMAR Z, JAWARD M H A review of hand gesture and sign language recognition techniques[J]. International Journal of Machine Learning and Cybernetics, 2019, (10): 131- 153
22	LIU J, WANG X. Early recognition of tomato gray leaf spot disease based on MobileNetv2-YOLOv3 model [EB/OL]. [2020-08-15]. https://doi.org/10.1186/s13007-021-00708-7.
23	RAJENDRAN S P, SHINE L, PRADEEP R, et al. Real-time traffic sign recognition using YOLOv3 based detector [C]// 2019 10th International Conference on Computing, Communication and Networking Technologies. [S. l. ]: IEEE, 2019.
24	YI Z, YONGLIANG S, JUN Z An improved tiny-yolov3 pedestrian detection algorithm[J]. Optik-International Journal for Light and Electron Optics, 2019, 183: 17- 23 doi: 10.1016/j.ijleo.2019.02.038
25	周文军, 张勇, 王昱洁基于DSSD的静态手势实时识别方法[J]. 计算机工程, 2020, 46 (510): 261- 267 ZHOU Wen-jun, ZHANG Yong, WANG Yu-jie Real-time recognition of static gestures based on DSSD[J]. Computer Engineering, 2020, 46 (510): 261- 267

[1]	李志, 单洪, 马涛, 黄郡. 基于反向标签传播的移动终端用户群体发现[J]. 浙江大学学报(工学版), 2018, 52(11): 2171-2179.
[2]	杨文珍, 张昊, 吴新丽, 邵明朝, 金中正. 面向移动终端人机交互的指尖点击力[J]. 浙江大学学报(工学版), 2016, 50(10): 1995-2001.
[3]	跃龙,陈岭,陈根才. 基于层次化BoF模型和Spectral-HIK过滤的手势识别算法闯[J]. J4, 2013, 47(9): 1531-1536.

Viewed

Full text

Abstract

Cited

Shared

Discussed