|
|
Static gesture real-time recognition method based on ShuffleNetv2-YOLOv3 model |
Wen-bin XIN1( ),Hui-min HAO1,Ming-long BU1,2,Yuan LAN1( ),Jia-hai HUANG1,*( ),Xiao-yan XIONG1 |
1. School of Mechanical and Transportation Engineering, Taiyuan University of Technology, Taiyuan 030024, China 2. Harbin Electric Machinery Limited Company, Harbin 150040, China |
|
|
Abstract An efficient ShuffleNetv2 and YOLOv3 integrated network static gesture real-time recognition method was proposed to reduce the computing power requirements of the model on the hardware aiming at the characteristics of limited computing resources and small storage space under the mobile terminal platform. The computational complexity of the model was reduced by replacing Darknet-53 with the lightweight network ShuffleNetv2 as the backbone network. The CBAM attention mechanism module was introduced to strengthen the network’s attention to space and channels. The K-means clustering algorithm was used to regenerate the aspect ratio and number of Anchors, so that the regenerated Anchors size can accurately locate the target to improve the detection accuracy of the model. The experimental results showed that the average recognition accuracy of the proposed algorithm on gesture recognition was 99.2%, and the recognition speed was 44 frames/s. The inference time of a single 416×416 picture on the GPU was 15 ms, and the inference time on the CPU was 58 ms. The memory occupied by the model was 15.1 MB. The method has the advantages of high recognition accuracy, fast recognition speed, and low memory occupancy rate, which is conducive to the deployment of models on mobile terminals.
|
Received: 24 September 2020
Published: 27 October 2021
|
|
Fund: 国家重点研发计划资助项目(2018YFB1308700);2020年山西省关键核心技术和共性技术研发攻关专项项目(2020XXX009,2020XXX001) |
Corresponding Authors:
Jia-hai HUANG
E-mail: 2878095493@qq.com;lanyuan@tyut.edu.cn;huangjiahai@tyut.edu.cn
|
基于ShuffleNetv2-YOLOv3模型的静态手势实时识别方法
针对移动端平台下计算资源有限、存储空间小的特点,提出高效的ShuffleNetv2及YOLOv3集成网络静态手势实时识别方法,以减小模型对硬件的计算能力需求. 通过将轻量化网络ShuffleNetv2代替Darknet-53作为主干网络,减小模型的计算复杂度. 引入CBAM注意力机制模块,加强网络对空间和通道的关注度. 采用K-means聚类算法,重新生成Anchors的长宽比和数量,使重新生成的Anchors尺寸对目标进行精确定位来提高模型的检测精度. 实验结果表明,提出算法在手势识别上的平均识别准确率为99.2%,识别速度为44帧/s,单张416×416图片在GPU上的推理时间为15 ms,CPU上的推理时间为58 ms,模型所占内存为15.1 MB. 该方法具有识别精度高、识别速度快、内存占用率低等优点,有利于模型在移动终端上部署.
关键词:
YOLOv3,
轻量化ShuffleNetv2网络,
CBAM注意力机制,
手势识别,
移动终端
|
|
[1] |
JIANG D, ZHENG Z, LI G, et al Gesture recognition based on binocular vision[J]. Cluster Computing, 2018, 22 (3): 1- 11
|
|
|
[2] |
AL-HELALI B M, MAHMOUD S A Arabic online handwriting recognition (AOHR): a survey[J]. ACM Computing Surveys, 2017, 50 (3): 1- 35
|
|
|
[3] |
GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [EB/OL]. [2020-08-15]. https://doi.org/10.1109/cvpr.2014.81.
|
|
|
[4] |
HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37 (9): 1904- 1916
|
|
|
[5] |
REN S, HE K, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149
|
|
|
[6] |
LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]// Proceedings of European Conference on Computer Vision. Berlin: Springer, 2016: 21-37.
|
|
|
[7] |
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C.: IEEE, 2016: 779-788.
|
|
|
[8] |
HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications [EB/OL]. [2020-08-15]. https://arxiv.org/abs/1704.04861.
|
|
|
[9] |
ZHANG X, ZHOU X, LIN M, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices [EB/OL]. [2020-08-15]. https://arxiv.org/abs/1707.01083v2.
|
|
|
[10] |
REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. [2019-02-25]. https://arxiv.org/abs/1804.02767v1.
|
|
|
[11] |
MA N, ZHANG X, ZHENG H T, et al. ShuffleNetV2: practical guidelines for efficient CNN architecture design [C]// European Conference on Computer Vision. Cham: Springer, 2018: 116-131.
|
|
|
[12] |
SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2018: 4510-4520.
|
|
|
[13] |
REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// IEEE Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2017: 6517-6525.
|
|
|
[14] |
LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2017: 1-9.
|
|
|
[15] |
WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018.
|
|
|
[16] |
MARIN G, DOMINIO F, ZANUTTIGH P. Hand gesture recognition with leap motion and kinect devices [C]// IEEE International Conference on Image Processing. Paris: IEEE, 2014.
|
|
|
[17] |
MARIN G, DOMINIO F, ZANUTTIGH P Hand gesture recognition with jointly calibrated leap motion and depth sensor[J]. Multimedia Tools and Applications, 2015, 75 (22): 1- 10
|
|
|
[18] |
MEMO A, MINTO L, ZANUTTIGH P. Exploiting silhouette descriptors and synthetic data for hand gesture recognition [EB/OL]. [2020-08-15]. https://dx.doi.org/10.2312/stag.20151288.
|
|
|
[19] |
MEMO A, ZANUTTIGH P Head-mounted gesture controlled interface for human-computer interaction[J]. Multimedia Tools and Applications, 2017, 77 (6): 1- 13
|
|
|
[20] |
PINTO R F, BORGES C D B, ALMEIDA A M A, et al. Static hand gesture recognition based on convolutional neural networks [EB/OL]. [2020-08-15]. https://doi.org/10.1155/2019/4167890.
|
|
|
[21] |
CHEOK M J, OMAR Z, JAWARD M H A review of hand gesture and sign language recognition techniques[J]. International Journal of Machine Learning and Cybernetics, 2019, (10): 131- 153
|
|
|
[22] |
LIU J, WANG X. Early recognition of tomato gray leaf spot disease based on MobileNetv2-YOLOv3 model [EB/OL]. [2020-08-15]. https://doi.org/10.1186/s13007-021-00708-7.
|
|
|
[23] |
RAJENDRAN S P, SHINE L, PRADEEP R, et al. Real-time traffic sign recognition using YOLOv3 based detector [C]// 2019 10th International Conference on Computing, Communication and Networking Technologies. [S. l. ]: IEEE, 2019.
|
|
|
[24] |
YI Z, YONGLIANG S, JUN Z An improved tiny-yolov3 pedestrian detection algorithm[J]. Optik-International Journal for Light and Electron Optics, 2019, 183: 17- 23
doi: 10.1016/j.ijleo.2019.02.038
|
|
|
[25] |
周文军, 张勇, 王昱洁 基于DSSD的静态手势实时识别方法[J]. 计算机工程, 2020, 46 (510): 261- 267 ZHOU Wen-jun, ZHANG Yong, WANG Yu-jie Real-time recognition of static gestures based on DSSD[J]. Computer Engineering, 2020, 46 (510): 261- 267
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|