Static gesture real-time recognition method based on ShuffleNetv2-YOLOv3 model

doi:10.3785/j.issn.1008-973X.2021.10.003

Journal of ZheJiang University (Engineering Science)

2021, Vol. 55

Issue (10): 1815-1824 DOI: 10.3785/j.issn.1008-973X.2021.10.003

Static gesture real-time recognition method based on ShuffleNetv2-YOLOv3 model

Wen-bin XIN1(

),Hui-min HAO1,Ming-long BU1,2,Yuan LAN1(

),Jia-hai HUANG1,*(

),Xiao-yan XIONG1

1. School of Mechanical and Transportation Engineering, Taiyuan University of Technology, Taiyuan 030024, China
2. Harbin Electric Machinery Limited Company, Harbin 150040, China

Download:

HTML

PDF(923KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

An efficient ShuffleNetv2 and YOLOv3 integrated network static gesture real-time recognition method was proposed to reduce the computing power requirements of the model on the hardware aiming at the characteristics of limited computing resources and small storage space under the mobile terminal platform. The computational complexity of the model was reduced by replacing Darknet-53 with the lightweight network ShuffleNetv2 as the backbone network. The CBAM attention mechanism module was introduced to strengthen the network’s attention to space and channels. The K-means clustering algorithm was used to regenerate the aspect ratio and number of Anchors, so that the regenerated Anchors size can accurately locate the target to improve the detection accuracy of the model. The experimental results showed that the average recognition accuracy of the proposed algorithm on gesture recognition was 99.2%, and the recognition speed was 44 frames/s. The inference time of a single 416×416 picture on the GPU was 15 ms, and the inference time on the CPU was 58 ms. The memory occupied by the model was 15.1 MB. The method has the advantages of high recognition accuracy, fast recognition speed, and low memory occupancy rate, which is conducive to the deployment of models on mobile terminals.

Key words： YOLOv3 lightweight ShuffleNetv2 network CBAM attention mechanism gesture recognition mobile terminal

Received: 24 September 2020 Published: 27 October 2021

CLC:

TP 391

Fund: 国家重点研发计划资助项目(2018YFB1308700)；2020年山西省关键核心技术和共性技术研发攻关专项项目(2020XXX009，2020XXX001)

Corresponding Authors: Jia-hai HUANG E-mail: 2878095493@qq.com;lanyuan@tyut.edu.cn;huangjiahai@tyut.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Wen-bin XIN
	Hui-min HAO
	Ming-long BU
	Yuan LAN
	Jia-hai HUANG
	Xiao-yan XIONG

Cite this article:

Wen-bin XIN,Hui-min HAO,Ming-long BU,Yuan LAN,Jia-hai HUANG,Xiao-yan XIONG. Static gesture real-time recognition method based on ShuffleNetv2-YOLOv3 model. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1815-1824.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2021.10.003 OR https://www.zjujournals.com/eng/Y2021/V55/I10/1815

基于ShuffleNetv2-YOLOv3模型的静态手势实时识别方法

针对移动端平台下计算资源有限、存储空间小的特点，提出高效的ShuffleNetv2及YOLOv3集成网络静态手势实时识别方法，以减小模型对硬件的计算能力需求. 通过将轻量化网络ShuffleNetv2代替Darknet-53作为主干网络，减小模型的计算复杂度. 引入CBAM注意力机制模块，加强网络对空间和通道的关注度. 采用K-means聚类算法，重新生成Anchors的长宽比和数量，使重新生成的Anchors尺寸对目标进行精确定位来提高模型的检测精度. 实验结果表明，提出算法在手势识别上的平均识别准确率为99.2%，识别速度为44帧/s，单张416×416图片在GPU上的推理时间为15 ms，CPU上的推理时间为58 ms，模型所占内存为15.1 MB. 该方法具有识别精度高、识别速度快、内存占用率低等优点，有利于模型在移动终端上部署.

关键词： YOLOv3, 轻量化ShuffleNetv2网络, CBAM注意力机制, 手势识别, 移动终端

Fig.1 Overall structure of YOLOv3 model

Fig.2 Overall structure of ShuffleNetv2-YOLOv3 model

Tab.1 Network structure of ShuffleNetv2

Fig.3 Flowchart of gesture recognition

Fig.4 Self-made gesture dataset samples

Fig.5 Samples of Microsoft Kinect and Leap Motion dataset

Fig.6 Samples of Creative Senz3D dataset

Tab.2 Algorithm training hardware environment configuration

Tab.3 Test results of ShuffleNetv2 different output channels

Tab.4 Test results of ShuffleNetv2+CBAM different output channels

Tab.5 Test results of model accuracy by different Anchors

Tab.6 Test results of using CBAM attention mechanism module

Tab.7 Test results of using K-means to cluster Anchors

Tab.8 Test results of different backbone networks

Tab.9 Test results on different datasets

Tab.10 Improved model test results on hardware performance

Fig.7 Recognition results of single target gesture

Fig.8 Recognition results of multi-target gesture


[1]	JIANG D, ZHENG Z, LI G, et al Gesture recognition based on binocular vision[J]. Cluster Computing, 2018, 22 (3): 1- 11

[2]	AL-HELALI B M, MAHMOUD S A Arabic online handwriting recognition (AOHR): a survey[J]. ACM Computing Surveys, 2017, 50 (3): 1- 35

[3]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [EB/OL]. [2020-08-15]. https://doi.org/10.1109/cvpr.2014.81.

[4]	HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37 (9): 1904- 1916

[5]	REN S, HE K, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149

[6]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// Proceedings of European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.

[7]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C.: IEEE, 2016: 779-788.

[8]	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications [EB/OL]. [2020-08-15]. https://arxiv.org/abs/1704.04861.

[9]	ZHANG X, ZHOU X, LIN M, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices [EB/OL]. [2020-08-15]. https://arxiv.org/abs/1707.01083v2.

[10]	REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. [2019-02-25]. https://arxiv.org/abs/1804.02767v1.

[11]	MA N, ZHANG X, ZHENG H T, et al. ShuffleNetV2: practical guidelines for efficient CNN architecture design [C]// European Conference on Computer Vision. Cham: Springer, 2018: 116-131.

[12]	SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2018: 4510-4520.

[13]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// IEEE Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2017: 6517-6525.

[14]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2017: 1-9.

[15]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018.

[16]	MARIN G, DOMINIO F, ZANUTTIGH P. Hand gesture recognition with leap motion and kinect devices [C]// IEEE International Conference on Image Processing. Paris: IEEE, 2014.

[17]	MARIN G, DOMINIO F, ZANUTTIGH P Hand gesture recognition with jointly calibrated leap motion and depth sensor[J]. Multimedia Tools and Applications, 2015, 75 (22): 1- 10

[18]	MEMO A, MINTO L, ZANUTTIGH P. Exploiting silhouette descriptors and synthetic data for hand gesture recognition [EB/OL]. [2020-08-15]. https://dx.doi.org/10.2312/stag.20151288.

[19]	MEMO A, ZANUTTIGH P Head-mounted gesture controlled interface for human-computer interaction[J]. Multimedia Tools and Applications, 2017, 77 (6): 1- 13

[20]	PINTO R F, BORGES C D B, ALMEIDA A M A, et al. Static hand gesture recognition based on convolutional neural networks [EB/OL]. [2020-08-15]. https://doi.org/10.1155/2019/4167890.

[21]	CHEOK M J, OMAR Z, JAWARD M H A review of hand gesture and sign language recognition techniques[J]. International Journal of Machine Learning and Cybernetics, 2019, (10): 131- 153

[22]	LIU J, WANG X. Early recognition of tomato gray leaf spot disease based on MobileNetv2-YOLOv3 model [EB/OL]. [2020-08-15]. https://doi.org/10.1186/s13007-021-00708-7.

[23]	RAJENDRAN S P, SHINE L, PRADEEP R, et al. Real-time traffic sign recognition using YOLOv3 based detector [C]// 2019 10th International Conference on Computing, Communication and Networking Technologies. [S. l. ]: IEEE, 2019.

[24]	YI Z, YONGLIANG S, JUN Z An improved tiny-yolov3 pedestrian detection algorithm[J]. Optik-International Journal for Light and Electron Optics, 2019, 183: 17- 23 doi: 10.1016/j.ijleo.2019.02.038

[25]	周文军, 张勇, 王昱洁基于DSSD的静态手势实时识别方法[J]. 计算机工程, 2020, 46 (510): 261- 267 ZHOU Wen-jun, ZHANG Yong, WANG Yu-jie Real-time recognition of static gestures based on DSSD[J]. Computer Engineering, 2020, 46 (510): 261- 267

[1]	Wei-qi CHEN,Jing-chang WANG,Ling CHEN,Yong-qin YANG,Yong WU. Prediction model of multi-factor aware mobile terminal replacement based on deep neural network[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 109-115.

[2]	LI Zhi, SHAN Hong, MA Tao, HUANG Jun. Group discovery of mobile terminal users based on reverse-label propagation algorithm[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(11): 2171-2179.

Viewed

Full text

Abstract

Cited

Shared

Discussed