Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2020, Vol. 54 Issue (6): 1138-1146    DOI: 10.3785/j.issn.1008-973X.2020.06.010
Computer Technology     
Adaptive monocular 3D object detection algorithm based on spatial constraint
Jun-ning ZHANFG1(),Qun-xing SU1,2,*(),Peng-yuan LIU1,Zheng-jun WANG3,Hong-qiang GU1
1. Missile Engineering Department, Army Engineering University, Shijiazhuang 050003, China
2. Army Command Academy, Nanjing 210000, China
3. 32181 Troops, Shijiazhuang 050003, China
Download: HTML     PDF(1050KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The 3D-Cube algorithm without prior template matching was introduced, and an algorithm for adaptive detection of 3D objects was proposed. Firstly, the relationship among the camera, the object and the VP vanishing point was established, according to the transformation relationship between the world coordinate system, the camera and the moving target. By combining the spatial constraint relationship, a space constrained M-estimator sample and consensus (MSAC) algorithm was proposed to improve the robustness in complex scenes. To improve the accuracy of 3D frame estimation, an adaptive method of 3D frame corner estimation was proposed based on the VP perspective relationship. The 3D bounding box of the target object could be detected quickly by building the spatial constraint relation between 3D contour and 2D frame of the target. The experimental results show that the proposed method has the advantages of high accuracy and real-time performance, compared with other algorithms in indoor scenes, which also has better accuracy and robustness in outdoor scene experiment.



Key words3D target detection      perspective principle      vanishing point (VP)      space constraint      M-estimator sample and consensus (MSAC) algorithm     
Received: 08 May 2019      Published: 06 July 2020
CLC:  TP 242.6  
Corresponding Authors: Qun-xing SU     E-mail: zjn20101796@sina.cn;374027210@qq.com
Cite this article:

Jun-ning ZHANFG,Qun-xing SU,Peng-yuan LIU,Zheng-jun WANG,Hong-qiang GU. Adaptive monocular 3D object detection algorithm based on spatial constraint. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1138-1146.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.06.010     OR     http://www.zjujournals.com/eng/Y2020/V54/I6/1138


基于空间约束的自适应单目3D物体检测算法

引入无须先验模版匹配的3D目标检测算法,通过简化消失点(VP)计算和改进角点提取等步骤,提出一种自适应的单目3D物体检测算法. 针对复杂场景下VP 计算易受干扰的问题,根据室内场景中世界坐标系、相机以及目标物体之间的空间关系,建立目标、相机偏航角与VP之间的约束模型,提出一种基于空间约束的 M 估计子抽样一致性(MSAC)消失点计算方法;为了提高3D框的估计精度,在VP透视关系的基础上,提出一种自适应估计3D框角点的方法,通过建立目标3D轮廓线与2D框的空间约束关系,实现目标物体的3D框快速检测. 相关数据集的实验结果表明,所提方法相比于其他算法不仅在室内场景下具有估计精度高、实时性好的优势,而且在室外场景实验下也具有更好的精度和鲁棒性.


关键词: 3D目标检测,  透视原理,  消失点(VP),  空间约束,  M 估计子抽样一致性(MSAC)算法 
Fig.1 Transformation relationship between world coordinate system, camera coordinate system, and target coordinate system
Fig.2 Schematic diagram of opening and closing degree and tolerance capability of straight lines
Fig.3 Target 3D bounding box estimation based on vanishing point (VP)
Fig.4 Corner solution order in different cases
%
算法 $600 \times 450$ $500 \times 375$ $400 \times 300$ $300 \times 224$
MSAC 0.074 0.085 0.091 0.098
3D-Cube[21] 0.053 0.062 0.070 0.074
本文算法 0.051 0.059 0.067 0.072
Tab.1 Error rate of VP calculated by different algorithms
ms
算法 $600 \times 450$ $500 \times 375$ $400 \times 300$ $300 \times 224$
MSAC 72 57 45 39
3D-Cube[22] 163 128 96 77
本文算法 102 82 67 58
Tab.2 Calculation time of VP by different algorithms
算法 IOU Nt
注:1)表示仅对前10个目标提案进行结果分析
Primitive[26] 0.36 125
3dgp[27] 0.42 221
3D-Cube[22] 0.40 1904
3D-Cube1) 0.48 270
本文算法 0.42 1958
本文算法1) 0.51 320
Tab.3 Comparison of detection accuracy and quantity of different algorithms in Sun RGB-D data set
Fig.5 Visualization of 3D detection effect via different algorithms
算法 IoU Nt
注:1)表示仅对前10个目标提案进行结果分析
Deep[28] 0.33 10 957
SUBCNN[29] 0.21 8 730
3D-Cube[22] 0.20 10 571
3D-Cube1) 0.36 10 571
本文算法 0.21 10 593
本文算法1) 0.38 10 593
Tab.4 Comparison of detection accuracy and quantity of different algorithms on KITTI dataset
Fig.6 Visualization of 3D detection effect via different algorithms on KITTI dataset
[1]   袁公萍, 汤一平, 韩旺明, 等 基于深度卷积神经网络的车型识别方法[J]. 浙江大学学报: 工学版, 2018, 52 (4): 694- 702
YUAN Gong-ping, TANG Yi-ping, HAN Wang-ming, et al Vehicle recognition method based on deep convolution neural network[J]. Journal of Zhejiang University: Engineering Science, 2018, 52 (4): 694- 702
[2]   CAI H P. Fast detection of multiple textureless 3D objects [C] // International Conference on Computer Vision Systems. Petersburg: ICCVS, 2013: 103-112.
[3]   养明起. 基于深度神经网络的视觉位姿估计[M]. 安徽: 中国科学技术大学, 2018.
[4]   HODAN T, HALUZAS P. T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects [C] // IEEE Winter Conference on Applications of Computer Vision. Santa Rosa: IEEE WCACS, 2017: 880-888.
[5]   OHNO K, TSUBOUCHI T, SHIGEMATSU B, et al Differential GPS and odometry-based outdoor navigation of a mobile robot[J]. Advanced Robotics, 2004, 18 (6): 611- 635
doi: 10.1163/1568553041257431
[6]   FUENTES P J, RUIZE A J, RENDON J M Visual simultaneous localization and mapping: a survey[J]. Artificial Intelligence Review, 2012, 43 (1): 55- 81
[7]   ENDRES F, HESS J, STURM J, et al 3D mapping with an RGB-D camera[J]. IEEE Transactions on Robotics, 2014, 30 (1): 177- 187
doi: 10.1109/TRO.2013.2279412
[8]   HODAN T, ZABULIS X. Detection and fine 3D pose estimation of texture-less objects in RGB-D images [C] // IEEE International Conference on Computer Vision. Sydney: IEEE CVPR, 2014: 4421-4428.
[9]   DAVID G L Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2): 91- 110
doi: 10.1023/B:VISI.0000029664.99615.94
[10]   柳培忠, 阮晓虎, 田震, 等 一种基于多特征融合的视频目标跟踪方法[J]. 智能系统学报, 2015, (57): 319- 324
LIU Pei-zhong, RUAN Xiao-hu, TIAN Zhen, et al A video target tracking method based on multi-feature fusion[J]. Journal of Intelligent Systems, 2015, (57): 319- 324
[11]   贾祝广, 孙效玉, 王斌, 等 无人驾驶技术研究及展望[J]. 矿业装备, 2014, (5): 44- 47
JIA Zhu-guang, SUN Xiao-yu, WANG Bin, et al Research and prospect of unmanned driving technology[J]. Mining Equipment, 2014, (5): 44- 47
[12]   RUSU R B, BRADSKI G, THIBAUX R. Fast 3D recognition and pose using the viewpoint feature histogram [C] // IEEE/RSJ International Conference on Intelligent Robots and Systems. Taiwan: IEEE ICIRS, 2010: 148-154.
[13]   YU X, TANNER S. Pose-CNN: a convolutional neural network for 6D object pose estimation in cluttered scenes [C] // IEEE Conference on Computer Vision and Pattern Recognition. Saltlake: IEEE CVPR, 2018.
[14]   RAD M, LEPETIT V. BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth [C] // IEEE International Conference on Computer Vision. Venice: IEEE ICCV, 2017: 3848-3856.
[15]   KEHL W, MANHARDT F, TOMBARI F, et al. Ssd-6D: making RGB-based 3D detection and 6D pose estimation great again [C] // IEEE International Conference on Computer Vision. Venice: IEEE ICCV, 2017: 1530-1538.
[16]   REDMON J, FARHADI A. YOLO9000: Better, faster, stronger [C] // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE CVPR, 2017: 6517-6525.
[17]   BANSAL A, RUSSELL B, GUPTA A. Marr revisited: 2D-3D alignment via surface normal prediction [C] // IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE CVPR, 2016: 5965-5974.
[18]   CHABOT F, CHAOUCH M, RABARISOA J, et al. Deep manta: a coarsetone many-task network for joint 2D and 3D vehicle analysis from monocular image [C] // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE CVPR, 2017: 1827-1836.
[19]   谢雨宁. 融合形态学与特征点检测的建筑物自动三维重建关键技术研究[D]. 南京: 东南大学, 2016.
XIE Yu-ning. Research on key technologies of automatic 3D reconstruction of buildings fused with morphology and feature detection [D]. Nanjing: Southeast University, 2016.
[20]   HEDAU V, HOIEM D, FORSYTH D. Thinking inside the box: using appearance models and context based on room geometry [C] // European Conference on Computer Vision. Heraklion: ECCV, 2010: 224-237.
[21]   CHEN X, KUNDU K, ZHANG Z, MA et al. Monocular 3D object detection for autonomous driving [C] // IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE CVPR, 2016: 2147-2156.
[22]   YANG S, SCHERER S. CubeSLAM: monocular 3D object detection and SLAM without prior models [C] // IEEE Conference on Computer Vision and Pattern Recognition. Saltlake: IEEE CVPR, 2018: 1-16.
[23]   宋欣, 王正玑, 程章林, 等 多分辨率线段提取方法及线段语义分析[J]. 集成技术, 2018, 7 (9): 67- 78
SONG Xin, WANG Zheng-ju, CHENG Zhang-lin, et al Multiresolution line segment extraction method and semantics analysis[J]. Integrated Technology, 2018, 7 (9): 67- 78
[24]   SONG S, LICHTENBERG S P, XIAO J. Sun RGB-D: a RGB-D scene understanding benchmark suite [C] // IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE CVPR, 2015: 567-576.
[25]   GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite [C] // IEEE Conference on Computer Vision and Pattern Recognition. Rhode: IEEE CVPR, 2012: 3354–3361.
[26]   XIAO J, RUSSELL B, TORRALBA A. Localizing 3D cuboids in single-view images [C] // 25th International Conference on Neural Information Processing Systems. Cambodia: NIPSF 2012: 746-754.
[27]   CHOI W, CHAO Y W, PANTOFARU C, et al. Understanding indoor scenes using 3D geometric phrases [C] // IEEE Conference on Computer Vision and Pattern Recognition. Oregon: IEEE CVPR, 2013 : 33-40.
[28]   MOUSAVIAN A, ANGUEALOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry [C] // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE CVPR, 2017: 5632-5640.
[29]   XIANG Y, CHOI W, LIN Y, et al. Subcategory-aware convolutional neural networks for object proposals and detection [C] // Applications of Computer Vision. Santa Rosa: IEEE ACV, 2017: 924-933.
[30]   梁苍, 曹宁, 冯晔 改进的基于gLoG滤波实时消失点检测算法[J]. 国外电子测量技术, 2018, 37 (12): 36- 40
LIANG Cang, CAO Ning, FENG Ye Improved real-time vanishing point detection algorithm based on gLoG filter[J]. Foreign Electronic Measurement Technology, 2018, 37 (12): 36- 40
[1] Can-jun YANG,Zhen-zhe PENG,Ling-hui XU,Wei YANG. Design of flexible knee-joint protection exoskeleton and walking assistance method[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 213-221.
[2] JIA Song min, LU Ying bin, WANG Li jia, LI Xiu zhi, XU Tao. Mobile robot human tracking using hierarchical features[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(9): 1677-1683.
[3] JIANG Wen ting, GONG Xiao jin, LIU Ji lin. Incremental large scale dense semantic mapping[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(2): 385-391.
[4] MA Zi ang, XIANG Zhi yu. Calibration and 3D reconstruction with omnidirectional ranging by optic flow camera[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(9): 1651-1657.
[5] WANG Li-jun, HUANG Zhong-chao, ZHAO Yu-qian. New spatial-coherent latent topic model based on super-pixel segmentation and scene classification method[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(3): 402-408.
[6] CAO Teng, XIANG Zhi-yu, LIU Ji-lin. Obstacle detection based on V-intercept in disparity space[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(3): 409-414.
[7] LU Wei, XIANG Zhi-yu, YU Hai-bin, LIU Ji-lin. Object compressive tracking based on adaptive multi-feature appearance model[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(12): 2132-2138.
[8] CHEN Ming-ya, XIANG Zhi-yu, LIU Ji-lin. Assistance localization method for mobile robot based on
monocular natural visual landmarks
[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(2): 285-291.
[9] LIN Ying, GONG Xiao-jin, LIU Ji-lin. Calibration of fisheye cameras based on the viewing sphere[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(8): 1500-1507.
[10] WANG Hui-fang, ZHU Shi-qiang, WU Wen-xiang. Improved adaptive robust control of servo system with harmonic drive[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(10): 1757-1763.
[11] OUYANG Liu, XU Jin, GONG Xiao-jin, LIU Ji-lin. Optimization of visual odometry based on uncertainty analysis[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(9): 1572-1579.
[12] MA Li-sha, ZHOU Wen-hui, GONG Xiao-jin, LIU Ji-lin. Motion constrained generalized Field D* path planning[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(8): 1546-1552.
[13] LU Dan-hui, ZHOU Wen-hui, GONG Xiao-jin, LIU Ji-lin. Decoupled mobile robot motion estimation based on fusion of
visual and inertial measurement unit
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(6): 1021-1026.
[14] XU Jin, SHEN Min-yi, YANG Li, WANG Wei-qiang, LIU Ji-lin. Binocular bundle adjustment based localization
and terrain stitching for robot
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(7): 1141-1146.
[15] CHEN Jia-qian, LIUYu-tian, HE Yan, JIANG Jing-ping. Novel dynamic mapping method based on occupancy grid
model and sample sets
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(5): 794-798.