Extracting hand articulations from monocular depth images using curvature scale space descriptors

Shao-fan WANG,Chun LI,De-hui KONG,Bao-cai YIN

Front. Inform. Technol. Electron. Eng. 2016, 17 (1): 41-54. DOI: 10.1631/FITEE.1500126

Abstract

HTML

PDF (2840KB)

We propose a framework of hand articulation detection from a monocular depth image using curvature scale space (CSS) descriptors. We extract the hand contour from an input depth image, and obtain the fingertips and finger-valleys of the contour using the local extrema of a modified CSS map of the contour. Then we recover the undetected fingertips according to the local change of depths of points in the interior of the contour. Compared with traditional appearance-based approaches using either angle detectors or convex hull detectors, the modified CSS descriptor extracts the fingertips and finger-valleys more precisely since it is more robust to noisy or corrupted data; moreover, the local extrema of depths recover the fingertips of bending fingers well while traditional appearance-based approaches hardly work without matching models of hands. Experimental results show that our method captures the hand articulations more precisely compared with three state-of-the-art appearance-based approaches.

Fig. 2 Curvature scale space contours for two (a), three (b), four (c), and five (d) straight fingers in three cases: the first row satisfies $\textbf{css}{(t,\sigma)=\{(t,\sigma): k_\sigma(t)=0\}}$, the second row satisfies $\textrm{css}{(t,\sigma)=\{(t,\sigma): 0\leq k_\sigma(t)\leq2.5\}}$, and the third row satisfies $\textrm{css}{(t,\sigma)=\{(t,\sigma): 2\leq k_\sigma(t)\leq2.5\}}$

For each input hand contour $\{\boldsymbol{f}_t\}_{t=1}^{n}$, we compute its CSS contour map $\mbox{css}_f(t, \sigma)$. Instead of directly using Eq.1, we modify the traditional CSS contour map in two ways. First, since the obtained hand contour is a discrete sequence of points, the convolution operator and the curvature both need discretization forms (We use traditional discrete convolution operator for computing the convolution, and use quadratic fitting scheme for computing the curvature). Second, we define the CSS contour map by the absolute small value of the curvature of the evolving curve, instead of the zero points of the curvature of the curve directly. The reason is two-fold: for one thing, all the peak points of the hand contour do not exactly achieve zero curvature after the Gaussian smoothing (the first row of subfigures of Fig. 2); for another, setting the CSS contour as points of the curvature between zero and another small positive number leads to an extremely large number of points of CSS contours (the second row of subfigures of Fig. 2). Alternatively, our method defines a modified CSS contour map (Eq.2) by the collection of points which achieve the curvature within the interval $[c_1, c_2]$ of two small positive numbers (the third row of subfigures of Fig. 2), and extracts the fingertips using the local maxima of the contour with a threshold of $\sigma$.