Extracting hand articulations from monocular depth images using curvature scale space descriptors

Shao-fan WANG,Chun LI,De-hui KONG,Bao-cai YIN

Front. Inform. Technol. Electron. Eng. 2016, 17 (1): 41-54. DOI: 10.1631/FITEE.1500126

Abstract

HTML

PDF (2840KB)

We propose a framework of hand articulation detection from a monocular depth image using curvature scale space (CSS) descriptors. We extract the hand contour from an input depth image, and obtain the fingertips and finger-valleys of the contour using the local extrema of a modified CSS map of the contour. Then we recover the undetected fingertips according to the local change of depths of points in the interior of the contour. Compared with traditional appearance-based approaches using either angle detectors or convex hull detectors, the modified CSS descriptor extracts the fingertips and finger-valleys more precisely since it is more robust to noisy or corrupted data; moreover, the local extrema of depths recover the fingertips of bending fingers well while traditional appearance-based approaches hardly work without matching models of hands. Experimental results show that our method captures the hand articulations more precisely compared with three state-of-the-art appearance-based approaches.

Fig. 8 Qualitative results of hand articulations of Experiment 1 (red: fingertips; blue: finger-valleys): (a) hand models generated by our method; (b) and (c) the CSS phase; (d) and (e) the K-cos method (Lee and Lee, 2011); (f) and (g) the convex-hull method (Nagarajan et al., 2012). All the fingertips and finger-valleys are detected by CSS contours. References to color refer to the online version of this figure

We show the qualitative results of Experiment 1 in Figs.8-10. We mark the fingertips and finger-valleys obtained by three methods on both color images and depth images, and we show the final model obtained by our method. We see that, the examples of Fig. 8 capture all fingertips and finger-valleys using only the CSS phase. However, the CSS phase cannot capture all fingertips or finger-valleys in the examples of Figs.9 and 10. This is because those examples contain bending fingers whose fingertips do not locate at the hand contour. Fortunately, the phase of recovering bending fingertips works for those examples. The third and fourth rows show a good recovery of locations of bending fingertips. However, when the thumb is occluded or when more than two fingers are bending as in the fourth to seventh rows of Fig. 10, the detected locations lack accuracy. In the fifth row, the fingertip of the thumb is actually occluded by the ring finger while our method recovers the fingertip of the thumb in front of the ring finger. In particular, the sixth and seventh rows of Fig. 10 indicate that our method cannot effectively handle highly occlusive cases.