基于空间约束的自适应单目3D物体检测算法

doi:10.3785/j.issn.1008-973X.2020.06.010

浙江大学学报(工学版)

2020, Vol. 54

Issue (6): 1138-1146 DOI: 10.3785/j.issn.1008-973X.2020.06.010

计算机技术

基于空间约束的自适应单目3D物体检测算法

张峻宁1(

),苏群星1,2,*(

),刘鹏远1,王正军3,谷宏强1

1. 陆军工程大学导弹工程系，河北石家庄 050003
2. 陆军指挥学院，江苏南京 210000
3. 32181部队，河北石家庄 050003

Adaptive monocular 3D object detection algorithm based on spatial constraint

Jun-ning ZHANFG1(

),Qun-xing SU1,2,*(

),Peng-yuan LIU1,Zheng-jun WANG3,Hong-qiang GU1

1. Missile Engineering Department, Army Engineering University, Shijiazhuang 050003, China
2. Army Command Academy, Nanjing 210000, China
3. 32181 Troops, Shijiazhuang 050003, China

全文: PDF(1050 KB) HTML

摘要：

引入无须先验模版匹配的3D目标检测算法，通过简化消失点（VP）计算和改进角点提取等步骤，提出一种自适应的单目3D物体检测算法. 针对复杂场景下VP 计算易受干扰的问题，根据室内场景中世界坐标系、相机以及目标物体之间的空间关系，建立目标、相机偏航角与VP之间的约束模型，提出一种基于空间约束的 M 估计子抽样一致性（MSAC）消失点计算方法；为了提高3D框的估计精度，在VP透视关系的基础上，提出一种自适应估计3D框角点的方法，通过建立目标3D轮廓线与2D框的空间约束关系，实现目标物体的3D框快速检测. 相关数据集的实验结果表明，所提方法相比于其他算法不仅在室内场景下具有估计精度高、实时性好的优势，而且在室外场景实验下也具有更好的精度和鲁棒性.

关键词： 3D目标检测; 透视原理; 消失点（VP）; 空间约束; M 估计子抽样一致性（MSAC）算法

Abstract:

The 3D-Cube algorithm without prior template matching was introduced, and an algorithm for adaptive detection of 3D objects was proposed. Firstly, the relationship among the camera, the object and the VP vanishing point was established, according to the transformation relationship between the world coordinate system, the camera and the moving target. By combining the spatial constraint relationship, a space constrained M-estimator sample and consensus (MSAC) algorithm was proposed to improve the robustness in complex scenes. To improve the accuracy of 3D frame estimation, an adaptive method of 3D frame corner estimation was proposed based on the VP perspective relationship. The 3D bounding box of the target object could be detected quickly by building the spatial constraint relation between 3D contour and 2D frame of the target. The experimental results show that the proposed method has the advantages of high accuracy and real-time performance, compared with other algorithms in indoor scenes, which also has better accuracy and robustness in outdoor scene experiment.

Key words: 3D target detection perspective principle vanishing point (VP) space constraint M-estimator sample and consensus (MSAC) algorithm

收稿日期: 2019-05-08 出版日期: 2020-07-06

CLC:

TP 242.6

基金资助: 国家自然科学基金资助项目（51205405，51305454）

通讯作者: 苏群星 E-mail: zjn20101796@sina.cn;374027210@qq.com

作者简介: 张峻宁（1992—），男，博士生，从事目标检测、SLAM研究. orcid.org/0000-0002-4349-3568. E-mail： zjn20101796@sina.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	张峻宁
	苏群星
	刘鹏远
	王正军
	谷宏强

引用本文:

张峻宁,苏群星,刘鹏远,王正军,谷宏强. 基于空间约束的自适应单目3D物体检测算法[J]. 浙江大学学报(工学版), 2020, 54(6): 1138-1146.

Jun-ning ZHANFG,Qun-xing SU,Peng-yuan LIU,Zheng-jun WANG,Hong-qiang GU. Adaptive monocular 3D object detection algorithm based on spatial constraint. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1138-1146.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2020.06.010 或 http://www.zjujournals.com/eng/CN/Y2020/V54/I6/1138

图 1 世界坐标系、相机坐标系、目标坐标系的变换关系

图 2 直线开合度、囊括能力示意图

图 3 基于消失点（VP）估计目标3D边界框

图 4 不同情形下的角点求解顺序

表 1 不同算法计算VP的误差率

表 2 不同算法计算VP的运算时间

表 3 不同算法在Sun RGB-D数据集下的检测精度和数量对比

图 5 不同算法的3D检测效果可视化

表 4 不同算法在KITTI数据集的目标检测精度和数量对比

图 6 不同算法在KITTI数据集上的3D检测效果可视化

1	袁公萍, 汤一平, 韩旺明, 等基于深度卷积神经网络的车型识别方法[J]. 浙江大学学报: 工学版, 2018, 52 (4): 694- 702 YUAN Gong-ping, TANG Yi-ping, HAN Wang-ming, et al Vehicle recognition method based on deep convolution neural network[J]. Journal of Zhejiang University: Engineering Science, 2018, 52 (4): 694- 702
2	CAI H P. Fast detection of multiple textureless 3D objects [C] // International Conference on Computer Vision Systems. Petersburg: ICCVS, 2013: 103-112.
3	养明起. 基于深度神经网络的视觉位姿估计[M]. 安徽: 中国科学技术大学, 2018.
4	HODAN T, HALUZAS P. T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects [C] // IEEE Winter Conference on Applications of Computer Vision. Santa Rosa: IEEE WCACS, 2017: 880-888.
5	OHNO K, TSUBOUCHI T, SHIGEMATSU B, et al Differential GPS and odometry-based outdoor navigation of a mobile robot[J]. Advanced Robotics, 2004, 18 (6): 611- 635 doi: 10.1163/1568553041257431
6	FUENTES P J, RUIZE A J, RENDON J M Visual simultaneous localization and mapping: a survey[J]. Artificial Intelligence Review, 2012, 43 (1): 55- 81
7	ENDRES F, HESS J, STURM J, et al 3D mapping with an RGB-D camera[J]. IEEE Transactions on Robotics, 2014, 30 (1): 177- 187 doi: 10.1109/TRO.2013.2279412
8	HODAN T, ZABULIS X. Detection and fine 3D pose estimation of texture-less objects in RGB-D images [C] // IEEE International Conference on Computer Vision. Sydney: IEEE CVPR, 2014: 4421-4428.
9	DAVID G L Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2): 91- 110 doi: 10.1023/B:VISI.0000029664.99615.94
10	柳培忠, 阮晓虎, 田震, 等一种基于多特征融合的视频目标跟踪方法[J]. 智能系统学报, 2015, (57): 319- 324 LIU Pei-zhong, RUAN Xiao-hu, TIAN Zhen, et al A video target tracking method based on multi-feature fusion[J]. Journal of Intelligent Systems, 2015, (57): 319- 324
11	贾祝广, 孙效玉, 王斌, 等无人驾驶技术研究及展望[J]. 矿业装备, 2014, (5): 44- 47 JIA Zhu-guang, SUN Xiao-yu, WANG Bin, et al Research and prospect of unmanned driving technology[J]. Mining Equipment, 2014, (5): 44- 47
12	RUSU R B, BRADSKI G, THIBAUX R. Fast 3D recognition and pose using the viewpoint feature histogram [C] // IEEE/RSJ International Conference on Intelligent Robots and Systems. Taiwan: IEEE ICIRS, 2010: 148-154.
13	YU X, TANNER S. Pose-CNN: a convolutional neural network for 6D object pose estimation in cluttered scenes [C] // IEEE Conference on Computer Vision and Pattern Recognition. Saltlake: IEEE CVPR, 2018.
14	RAD M, LEPETIT V. BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth [C] // IEEE International Conference on Computer Vision. Venice: IEEE ICCV, 2017: 3848-3856.
15	KEHL W, MANHARDT F, TOMBARI F, et al. Ssd-6D: making RGB-based 3D detection and 6D pose estimation great again [C] // IEEE International Conference on Computer Vision. Venice: IEEE ICCV, 2017: 1530-1538.
16	REDMON J, FARHADI A. YOLO9000: Better, faster, stronger [C] // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE CVPR, 2017: 6517-6525.
17	BANSAL A, RUSSELL B, GUPTA A. Marr revisited: 2D-3D alignment via surface normal prediction [C] // IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE CVPR, 2016: 5965-5974.
18	CHABOT F, CHAOUCH M, RABARISOA J, et al. Deep manta: a coarsetone many-task network for joint 2D and 3D vehicle analysis from monocular image [C] // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE CVPR, 2017: 1827-1836.
19	谢雨宁. 融合形态学与特征点检测的建筑物自动三维重建关键技术研究[D]. 南京: 东南大学, 2016. XIE Yu-ning. Research on key technologies of automatic 3D reconstruction of buildings fused with morphology and feature detection [D]. Nanjing: Southeast University, 2016.
20	HEDAU V, HOIEM D, FORSYTH D. Thinking inside the box: using appearance models and context based on room geometry [C] // European Conference on Computer Vision. Heraklion: ECCV, 2010: 224-237.
21	CHEN X, KUNDU K, ZHANG Z, MA et al. Monocular 3D object detection for autonomous driving [C] // IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE CVPR, 2016: 2147-2156.
22	YANG S, SCHERER S. CubeSLAM: monocular 3D object detection and SLAM without prior models [C] // IEEE Conference on Computer Vision and Pattern Recognition. Saltlake: IEEE CVPR, 2018: 1-16.
23	宋欣, 王正玑, 程章林, 等多分辨率线段提取方法及线段语义分析[J]. 集成技术, 2018, 7 (9): 67- 78 SONG Xin, WANG Zheng-ju, CHENG Zhang-lin, et al Multiresolution line segment extraction method and semantics analysis[J]. Integrated Technology, 2018, 7 (9): 67- 78
24	SONG S, LICHTENBERG S P, XIAO J. Sun RGB-D: a RGB-D scene understanding benchmark suite [C] // IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE CVPR, 2015: 567-576.
25	GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite [C] // IEEE Conference on Computer Vision and Pattern Recognition. Rhode: IEEE CVPR, 2012: 3354–3361.
26	XIAO J, RUSSELL B, TORRALBA A. Localizing 3D cuboids in single-view images [C] // 25th International Conference on Neural Information Processing Systems. Cambodia: NIPSF 2012: 746-754.
27	CHOI W, CHAO Y W, PANTOFARU C, et al. Understanding indoor scenes using 3D geometric phrases [C] // IEEE Conference on Computer Vision and Pattern Recognition. Oregon: IEEE CVPR, 2013 : 33-40.
28	MOUSAVIAN A, ANGUEALOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry [C] // IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE CVPR, 2017: 5632-5640.
29	XIANG Y, CHOI W, LIN Y, et al. Subcategory-aware convolutional neural networks for object proposals and detection [C] // Applications of Computer Vision. Santa Rosa: IEEE ACV, 2017: 924-933.
30	梁苍, 曹宁, 冯晔改进的基于gLoG滤波实时消失点检测算法[J]. 国外电子测量技术, 2018, 37 (12): 36- 40 LIANG Cang, CAO Ning, FENG Ye Improved real-time vanishing point detection algorithm based on gLoG filter[J]. Foreign Electronic Measurement Technology, 2018, 37 (12): 36- 40

[1]	杨灿军,彭桢哲,徐铃辉,杨巍. 柔性膝关节保护外骨骼及其行走助力方法设计[J]. 浙江大学学报(工学版), 2021, 55(2): 213-221.
[2]	贾松敏,卢迎彬,王丽佳,李秀智,徐涛. 分层特征移动机器人行人跟踪[J]. 浙江大学学报(工学版), 2016, 50(9): 1677-1683.
[3]	江文婷, 龚小谨, 刘济林. 基于增量计算的大规模场景致密语义地图构建[J]. 浙江大学学报(工学版), 2016, 50(2): 385-391.
[4]	马子昂,项志宇. 光流测距全向相机的标定与三维重构[J]. 浙江大学学报(工学版), 2015, 49(9): 1651-1657.
[5]	王立军,黄忠朝,赵于前. 基于超像素分割的空间相关主题模型及场景分类方法[J]. 浙江大学学报(工学版), 2015, 49(3): 402-408.
[6]	曹腾,项志宇,刘济林. 基于视差空间V-截距的障碍物检测[J]. 浙江大学学报(工学版), 2015, 49(3): 409-414.
[7]	卢维, 项志宇, 于海滨, 刘济林. 基于自适应多特征表观模型的目标压缩跟踪[J]. 浙江大学学报(工学版), 2014, 48(12): 2132-2138.
[8]	陈明芽, 项志宇, 刘济林. 单目视觉自然路标辅助的移动机器人定位方法[J]. J4, 2014, 48(2): 285-291.
[9]	林颖, 龚小谨, 刘济林. 基于单位视球的鱼眼相机标定方法[J]. J4, 2013, 47(8): 1500-1507.
[10]	王会方, 朱世强, 吴文祥. 谐波驱动伺服系统的改进自适应鲁棒控制[J]. J4, 2012, 46(10): 1757-1763.
[11]	欧阳柳,徐进,龚小谨,刘济林. 基于不确定性分析的视觉里程计优化[J]. J4, 2012, 46(9): 1572-1579.
[12]	马丽莎, 周文晖, 龚小谨, 刘济林. *基于运动约束的泛化Field D路径规划**[J]. J4, 2012, 46(8): 1546-1552.
[13]	路丹晖, 周文晖, 龚小谨, 刘济林. 视觉和IMU融合的移动机器人运动解耦估计[J]. J4, 2012, 46(6): 1021-1026.
[14]	徐进,沈敏一,杨力,王炜强,刘济林. 基于双目光束法平差的机器人定位与地形拼接[J]. J4, 2011, 45(7): 1141-1146.
[15]	陈家乾,柳玉甜,何衍,蒋静坪. 基于栅格模型和样本集合的动态环境地图创建[J]. J4, 2011, 45(5): 794-798.

Viewed

Full text

Abstract

Cited

Shared

Discussed