基于马尔科夫随机场的非参数化RGB-D场景理解

doi:10.3785/j.issn.1008-973X.2016.07.014

浙江大学学报(工学版)

计算机技术、信息工程

基于马尔科夫随机场的非参数化RGB-D场景理解

费婷婷,龚小谨

浙江大学信息与电子工程学系,浙江杭州 310027

Nonparametric RGB-D scene parsing based on Markov random field model

FEI Ting ting,GONG Xiao jin

Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China

全文: PDF(2801 KB) HTML

摘要：

针对RGB-D场景下的场景理解问题,提出高效的基于标签传递机制的非参数化场景理解算法.该算法主要分为标签源构建、超像素双向匹配和标签传递三个步骤.与传统的参数化RGB-D场景理解方法相比,该算法不需要繁琐的训练,具有简单高效的特点.与传统的非参数化场景理解方法不同,该算法在系统的各个设计环节都有效利用了深度图提供的三维信息,在超像素匹配环节提出双向匹配机制,以减少特征误匹配；构建基于协同表示分类（CRC）的马尔科夫随机场（MRF）,用Graph Cuts方法求出最优解,获得场景图像每个像素的语义标签.该算法分别在室内的NYU-V1数据集和室外的KITTI数据集上进行实验.实验结果表明,与现有算法相比,该算法取得了显著的性能提升, 对室内、外场景均适用.

Abstract:

An effective nonparametric method was proposed for RGB-D scene parsing. The method is based upon the label transferring scheme, which includes label pool construction, bi-directional superpixel matching -nd label transferring stages. Compared to traditional parametric RGB-D scene parsing methods, the approach requires no tedious training stage, which makes it simple and efficient. In contrast to previous nonparametric techniques, our method not only incorporate geometric contexts at all the stages, but also propose a bi-directional scheme for superpixel matching in order to reduce mismatching. Then a collaborative representation based classification (CRC) mechanism was built for Markov random field (MRF), and parsing result was achieved through minimizing the energy function via Graph Cuts. The effectiveness of the approach was validated both on the indoor NYU Depth V1 dataset and the outdoor KITTI dataset. The approach outperformed both state-of-the-art RGB-D parsing techniques and a classical nonparametric superparsing method. The algorithm can be applied to different scenarios, having a strong practical value．

出版日期: 2016-07-23

TP 391

通讯作者: 龚小谨，女，副教授. ORCID：0000-0001-9955-3569. E-mail: gongxj@zju.edu.cn

作者简介: 费婷婷（1990-），女，硕士生，从事机器视觉研究.ORCID： 0000-0003-1924-426X.E-mail：21231083@zju.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

引用本文:

费婷婷,龚小谨. 基于马尔科夫随机场的非参数化RGB-D场景理解[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2016.07.014.

FEI Ting ting,GONG Xiao jin. Nonparametric RGB-D scene parsing based on Markov random field model. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2016.07.014.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2016.07.014 或 http://www.zjujournals.com/eng/CN/Y2016/V50/I7/1322

［1］ Velodyne. Velodyne hdl64e ［EB/OL］. \[20140610\]. http:∥velodynelidar.com/lidar/．
［2］ Kinect. Microsoft kinect ［EB/OL］. \[20140610\].http:∥www.microsoft.com/enus/kinectforwindows/develop/learn.aspx．
［3］闫飞, 庄严, 王伟. 移动机器人基于多传感器信息融合的室外场景理解［J］. 控制理论与应用, 2011, 28(8):1093-1098.
YAN Fei, ZHUANG Yan, WANG Wei. Outdoor scene comprehension of mobile robot based on multisensor information fusion ［J］. Control Theory and Applications, 2011, 28(8):1093-1098．
［4］谭伦正, 夏利民, 夏胜平. 基于多级Sigmoid神经网络的城市交通场景理解［J］. 国防科技大学学报, 2012, 34(4): 1001-2486.
TAN Lunzheng, XIA Limin, XIA Shengping. Urban traffic scene understanding based on multilevel sigmoidal neural network ［J］. Journal of National University of Defense Technology, 2012, 34(4): 1001-2486．
［5］ SILBERMAN N, FERGUS R. Indoor scene segmentation using a structured light sensor ［C］∥Proceedings of ICCV. Barcelona: IEEE, 2011: 601-608．
［6］ REN Xiaofeng, BO Liefeng, FOX D. RGB(D) scene labeling: features and algorithms ［C］∥Proceedings of CVPR. Providence: IEEE, 2012: 2759-2766．
［7］ TORRALBA A, FERGUS R, FREEMAN W T. 80 million tiny images: a large dataset for nonparametric object and scene recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1958-1970．
［8］ SHOTTON J, WINN J, ROTHER C, et al. Textonboost for image understanding: multiclass object recognition and segmentation by jointly modeling texture, layout, and context ［J］. International Journal of Computer Vision, 2009, 81(1): 223．
［9］ FARABET C, COUPRIE C, NAJMAN L, et al. Learning hierarchical features for scene labeling ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1915-1929．
［10］ STURGESS P, ALAHARI K, LADICKY L, et al. Combining appearance and structure from motion features for road scene understanding ［C］ ∥ Proceedings of BMVC. London: BMVA, 2009．
［11］ LIU Ce, YUEN J, TORRALBA A. Nonparametric scene parsing: label transfer via dense scene alignment ［C］∥ Proceedings of CVPR. Miami: IEEE, 2009: 1972-1979．
［12］ TIGHE J, LAZEBNIK S. Superparsing: scalable nonparametric image parsing with superpixels ［C］ ∥ Proceedings of ECCV. Heraklion: Springer, 2010: 352-365．
［13］ YANG J, PRICE B, COHEN S, et al. Context driven scene parsing with attention to rare classes ［C］ ∥ Proceedings of CVPR. Columbus: IEEE, 2014．
［14］ EIGEN D, FERGUS R. Nonparametric image parsing using adaptive neighbor sets ［C］ ∥ Proceedings of CVPR. Providence: IEEE, 2012: 2799-2806．
［15］ ZHANG Lie, YANG Meng, FENG Xiangchu. Sparse representation or collaborative representation: which helps face recognition？［C］ ∥ Proceedings of ICCV. Barcelona: IEEE, 2011: 471-478．
［16］ BOYKOV Y, VEKSLER O, ZABIH R. Fast approximate energy minimization via graph cuts ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(11): 1222-1239．
［17］ OLIVA A, TORRALBA A. Building the gist of a scene: the role of global image features in recognition ［J］. Progress In Brain Research, 2006, 155: 23-36．
［18］ LEVINSHTEIN A, STERE A, KUTULAKOS N K, et al. Turbopixels: fast superpixels using geometric flows ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(12): 2290-2297．
［19］ BO Liefeng, REN Xiaofeng, FOX D. Kernel descriptors for visual recognition ［C］ ∥ NIPS. Vancouver: Neural Information Processing Systems Foundation, 2010: 244-252．
［20］ GEIGER A, LENZ P, URTASUM R. Are we ready for autonomous driving？ the KITTI vision benchmark suite ［C］ ∥ Proceedings of CVPR. Providence: IEEE, 2012: 3354-3361．

[1]	何雪军, 王进, 陆国栋, 刘振宇, 陈立, 金晶. 基于三角网切片及碰撞检测的工业机器人三维头像雕刻[J]. 浙江大学学报(工学版), 2017, 51(6): 1104-1110.
[2]	王桦, 韩同阳, 周可. 公安情报中基于关键图谱的群体发现算法[J]. 浙江大学学报(工学版), 2017, 51(6): 1173-1180.
[3]	尤海辉, 马增益, 唐义军, 王月兰, 郑林, 俞钟, 吉澄军. 循环流化床入炉垃圾热值软测量[J]. 浙江大学学报(工学版), 2017, 51(6): 1163-1172.
[4]	毕晓君, 王佳荟. 基于混合学习策略的教与学优化算法[J]. 浙江大学学报(工学版), 2017, 51(5): 1024-1031.
[5]	黄正宇, 蒋鑫龙, 刘军发, 陈益强, 谷洋. 基于融合特征的半监督流形约束定位方法[J]. 浙江大学学报(工学版), 2017, 51(4): 655-662.
[6]	蒋鑫龙, 陈益强, 刘军发, 忽丽莎, 沈建飞. 面向自闭症患者社交距离认知的可穿戴系统[J]. 浙江大学学报(工学版), 2017, 51(4): 637-647.
[7]	王亮, 於志文, 郭斌. 基于双层多粒度知识发现的移动轨迹预测模型[J]. 浙江大学学报(工学版), 2017, 51(4): 669-674.
[8]	廖苗, 赵于前, 曾业战, 黄忠朝, 张丙奎, 邹北骥. 基于支持向量机和椭圆拟合的细胞图像自动分割[J]. 浙江大学学报(工学版), 2017, 51(4): 722-728.
[9]	穆晶晶, 赵昕玥, 何再兴, 张树有. 基于凹凸变换与圆周拟合的重叠气泡轮廓重构[J]. 浙江大学学报(工学版), 2017, 51(4): 714-721.
[10]	戴彩艳, 陈崚, 李斌, 陈伯伦. 复杂网络中的抽样链接预测[J]. 浙江大学学报(工学版), 2017, 51(3): 554-561.
[11]	刘磊, 杨鹏, 刘作军. 采用多核相关向量机的人体步态识别[J]. 浙江大学学报(工学版), 2017, 51(3): 562-571.
[12]	郭梦丽, 达飞鹏, 邓星, 盖绍彦. 基于关键点和局部特征的三维人脸识别[J]. 浙江大学学报(工学版), 2017, 51(3): 584-589.
[13]	王海军, 葛红娟, 张圣燕. 基于核协同表示的快速目标跟踪算法[J]. 浙江大学学报(工学版), 2017, 51(2): 399-407.
[14]	张亚楠, 陈德运, 王莹洁, 刘宇鹏. 基于增量图形模式匹配的动态冷启动推荐方法[J]. 浙江大学学报(工学版), 2017, 51(2): 408-415.
[15]	刘宇鹏, 乔秀明, 赵石磊, 马春光. 统计机器翻译中大规模特征的深度融合[J]. 浙江大学学报(工学版), 2017, 51(1): 46-56.

Viewed

Full text

Abstract

Cited

Shared

Discussed