Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2016, Vol. 17 Issue (1): 41-54    DOI: 10.1631/FITEE.1500126
Original article     
Extracting hand articulations from monocular depth images using curvature scale space descriptors
Shao-fan WANG1,?(),Chun LI1,De-hui KONG1,?(),Bao-cai YIN2,1,3
1Beijing Key Laboratory of Multimedia and Intelligent Software Technology, College of Metropolitan Transportation, Beijing University of Technology, Beijing 100124, China
2School of Software Technology, Dalian University of Technology, Dalian 116024, China
3Collaborative Innovation Center of Electric Vehicles in Beijing, Beijing 100081, China
Download: HTML     PDF(2840KB)
Export: BibTeX | EndNote (RIS)      

Abstract  

We propose a framework of hand articulation detection from a monocular depth image using curvature scale space (CSS) descriptors. We extract the hand contour from an input depth image, and obtain the fingertips and finger-valleys of the contour using the local extrema of a modified CSS map of the contour. Then we recover the undetected fingertips according to the local change of depths of points in the interior of the contour. Compared with traditional appearance-based approaches using either angle detectors or convex hull detectors, the modified CSS descriptor extracts the fingertips and finger-valleys more precisely since it is more robust to noisy or corrupted data; moreover, the local extrema of depths recover the fingertips of bending fingers well while traditional appearance-based approaches hardly work without matching models of hands. Experimental results show that our method captures the hand articulations more precisely compared with three state-of-the-art appearance-based approaches.



Key wordsCurvature scale space (CSS)      Hand articulation      Convex hull      Hand contour     
Received: 20 April 2015      Published: 05 January 2016
CLC:  TP391  
  TP751  
Fund:  National Natural Science Foundation of China(No. 61227004, 61370120, 61390510, 61300065, 61402024);Beijing Municipal Natural Science Foundation, China(No. 4142010);Beijing Municipal Commission of Education, China(No. km201410005013);the Funding Project for Academic Human Resources Development in Institutions of Higher Learning under the Jurisdiction of Beijing Municipality, China
Corresponding Authors: Shao-fan WANG,De-hui KONG     E-mail: wangshaofan@bjut.edu.cn;kdh@bjut.edu.cn
Cite this article:

Shao-fan WANG,Chun LI,De-hui KONG,Bao-cai YIN. Extracting hand articulations from monocular depth images using curvature scale space descriptors. Front. Inform. Technol. Electron. Eng., 2016, 17(1): 41-54.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/FITEE.1500126     OR     http://www.zjujournals.com/xueshu/fitee/Y2016/V17/I1/41


Extracting hand articulations from monocular depth images using curvature scale space descriptors

We propose a framework of hand articulation detection from a monocular depth image using curvature scale space (CSS) descriptors. We extract the hand contour from an input depth image, and obtain the fingertips and finger-valleys of the contour using the local extrema of a modified CSS map of the contour. Then we recover the undetected fingertips according to the local change of depths of points in the interior of the contour. Compared with traditional appearance-based approaches using either angle detectors or convex hull detectors, the modified CSS descriptor extracts the fingertips and finger-valleys more precisely since it is more robust to noisy or corrupted data; moreover, the local extrema of depths recover the fingertips of bending fingers well while traditional appearance-based approaches hardly work without matching models of hands. Experimental results show that our method captures the hand articulations more precisely compared with three state-of-the-art appearance-based approaches.

Fig. 1 A flowchart of extracting the hand contour and palm center: (a) finding a point on the hand; (b) selecting a rectangular neighborhood; (c) segmenting the hand part; (d) extracting the contour of the hand part; (e) computing the maximum inscribed circle of the contour; (f) removing additional contour points
Fig. 2 Curvature scale space contours for two (a), three (b), four (c), and five (d) straight fingers in three cases: the first row satisfies $\textbf{css}{(t,\sigma)=\{(t,\sigma): k_\sigma(t)=0\}}$, the second row satisfies $\textrm{css}{(t,\sigma)=\{(t,\sigma): 0\leq k_\sigma(t)\leq2.5\}}$, and the third row satisfies $\textrm{css}{(t,\sigma)=\{(t,\sigma): 2\leq k_\sigma(t)\leq2.5\}}$
Fig. 3 A flowchart for detecting fingertips and finger-valleys and recovering bending non-thumb fingertips
Fig. 4 Recovery of the undetected fingertip of the thumb when the thumb is bending
Algorithm 1 Detecting hand articulations from the hand contour
Fig. 5 The RMS error (a) and the maximum error (b) of all test images for each fingertip
Fig. 6 The number of undetected fingertips(a) and the number of incorrectly detected fingertips (b) among all test images
Fig. 7 The hand model consists of fourteen cylinders and a plane, characterized by the three-dimensional coordinates of a palm center, five finger-roots, nine finger joints, and five fingertips
Fig. 8 Qualitative results of hand articulations of Experiment 1 (red: fingertips; blue: finger-valleys): (a) hand models generated by our method; (b) and (c) the CSS phase; (d) and (e) the K-cos method (Lee and Lee, 2011); (f) and (g) the convex-hull method (Nagarajan et al., 2012). All the fingertips and finger-valleys are detected by CSS contours. References to color refer to the online version of this figure
Fig. 9 Qualitative results of hand articulations of Experiment 1 (red: fingertips; blue: finger-valleys): (a) hand models generated by our method; (b) and (c) the CSS phase; (d) and (e) the K-cos method (Lee and Lee, 2011); (f) and (g) the convex-hull method (Nagarajan et al., 2012). One or two fingers are bending. References to color refer to the online version of this figure
Fig. 10 Qualitative results of hand articulations of Experiment 1 (red: fingertips; blue: finger-valleys): (a) hand models generated by our method; (b) and (c) the CSS phase; (d) and (e) the K-cos method (Lee and Lee, 2011); (f) and (g) the convex-hull method (Nagarajan et al., 2012). Three or four fingers are bending, or the thumb is bending. References to color refer to the online version of this figure
Fig. 11 Qualitative results of hand articulations of Experiment 2 (red: fingertips; blue: finger-valleys; yellow: palm center): (a1) and (a2) are the original depth images, (b1) and (b2) the CSS method, (c1) and (c2) the K-cos method (Lee and Lee, 2011), (d1) and (d2) the convex-hull method (Nagarajan et al., 2012), (e1) and (e2) the K-curvature method (Cerezo, 2012). References to color refer to the online version of this figure
Fig. 12 Failure examples of the CSS method of Experiment 2 (red: fingertips; blue: finger-valleys; yellow: palm center). References to color refer to the online version of this figure
[1]   Abbasi S , Mokhtarian F , Kittler J . Curvature scale space image in shape similarity retrieval. Multimedia Syst. 1999, 7 (8): 467-476 doi: 10.1007/s005300050147
doi: 10.1007/s005300050147
[2]   Athitsos V , Sclaroff S . An appearance-based framework for 3D hand shape classification and camera viewpoint estimation. 2002, Proc. 5th IEEE Int. Conf. on Automatic Face and Gesture Recognition: p.40-45 doi: 10.1109/AFGR.2002.1004129
doi: 10.1109/AFGR.2002.1004129
[3]   Athitsos V , Sclaroff S . Estimating 3D hand pose from a cluttered image. 2003, Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition: p.432-439 doi: 10.1109/CVPR.2003.1211500
doi: 10.1109/CVPR.2003.1211500
[4]   Cerezo T, 3D hand and finger recognition using Kinect2012, Technical ReportUniversidad de Granada, Spain Available at http://frantracerkinectft.codeplex.com.
[5]   Chang WY , Chen CS , Jian YD . Visual tracking in high-dimensional state space by appearanceguided particle filtering. IEEE Trans. Image Process 2008, 17(7): 1054-1067 doi: 10.1109/TIP.2008.924283
doi: 10.1109/TIP.2008.924283 pmid: 18586623
[6]   de La Gorce M , Fleet DJ , Paragios N . Modelbased 3D hand pose estimation from monocular video. IEEE Trans. Patt. Anal. Mach. Intell. 2011, 33(9): 1793-1805 doi: 10.1109/TPAMI.2011.33
doi: 10.1109/TPAMI.2011.33 pmid: 21339527
[7]   Feng Z , Yang B , Chen Y . et al. . Features extraction from hand images based on new detection operators. Patt. Recog. 2011, 44(5): 1089-1105 doi: 10.1016/j.patcog.2010.08.007
doi: 10.1016/j.patcog.2010.08.007
[8]   Keskin C , K?ra? F , Kara YE . et al. . Real time hand pose estimation using depth sensors 2011, London In: Fossati, A., Gall, J., Grabner, H., et al. (Eds.), Consumer Depth Cameras for Computer Vision, Springer: p.119-137 doi: 10.1007/978-1-4471-4640-7_7
doi: 10.1007/978-1-4471-4640-7_7
[9]   Kirac F , Kara YE , Akarun L . Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Patt. Recog. Lett. 2014, 50(3): 415-422 doi: 10.1016/j.patrec.2013.09.003
doi: 10.1016/j.patrec.2013.09.003
[10]   Lee D , Lee S . Vision-based finger action recognition by angle detection and contour analysis. ETRI J. 2011, 33(3): 415-422 doi: 10.4218/etrij.11.0110.0313
doi: 10.4218/etrij.11.0110.0313
[11]   Ma Z , Wu E . Real-time and robust hand tracking with a single depth camera. Vis. Comput. 2014, 30(10): 1133-1144 doi: 10.1007/s00371-013-0894-1
doi: 10.1007/s00371-013-0894-1
[12]   Maisto M , Panella M , Liparulo L . et al. . An accurate algorithm for the identification of fingertips using an RGB-D camera. IEEE J. Emerg. Sel. Topics Circ. Syst. 2013, 3(2): 272-283 doi: 10.1109/JETCAS.2013.2256830
doi: 10.1109/JETCAS.2013.2256830
[13]   Morshidi M , Tjahjadi T . Gravity optimised particle filter for hand tracking. Patt. Recog. 2014, 47(1): 194-207 doi: 10.1016/j.patcog.2013.06.032
doi: 10.1016/j.patcog.2013.06.032
[14]   Nagarajan S , Subashini T , Ramalingam V . Vision based real time finger counter for hand gesture recognition. Int. J. Technol. 2012, 2(2): 1-5
[15]   Oikonomidis I , Kyriazis N , Argyros AA . Efficient model-based 3D tracking of hand articulations using Kinect. BMVC 2011, 1(2): 1-11
[16]   Qian C , Sun X , Wei Y , et al.. Realtime and robust hand tracking from depth. 2014, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1106-1113 doi: 10.1109/CVPR.2014.145
doi: 10.1109/CVPR.2014.145
[17]   Ren Z , Yuan J , Zhang Z . Robust hand gesture recognition based on finger-earth movera€?s distance with a commodity depth camera. 2011, Proc. 19th ACM Int. Conf. on Multimedia, p.1093-1096 doi: 10.1145/2072298.2071946
doi: 10.1145/2072298.2071946
[18]   Rosales R , Athitsos V , Sigal L , et al.. 3D hand pose reconstruction using specialized mappings. 2001, Proc. 8th IEEE Int. Conf. on Computer Vision, p.378-385 doi: 10.1109/ICCV.2001.937543
doi: 10.1109/ICCV.2001.937543
[19]   Schlattmann M , Kahlesz F , Sarlette R , et al.. Markerless 4 gestures 6 DOF real-time visual tracking of the human hand with automatic initialization. Comput. Graph. Forum. 2007, 26(3): 467-476 doi: 10.1111/j.1467-8659.2007.01069.x
doi: 10.1111/j.1467-8659.2007.01069.x
[20]   Tomasi C, Petrov S, Sastry A . 3D tracking = classification + interpolation. 2003, Proc. 9th IEEE Int. Conf. on Computer Vision, p.1441-1448 doi: 10.1109/ICCV.2003.1238659
doi: 10.1109/ICCV.2003.1238659
[1] Jun Huang, Zheng-zhi Han. Tracking control of the linear differential inclusion[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(6): 464-469.