Computer Technology, Information Engineering |
|
|
|
|
Nonparametric RGB-D scene parsing based on Markov random field model |
FEI Ting ting,GONG Xiao jin |
Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China |
|
|
Abstract An effective nonparametric method was proposed for RGB-D scene parsing. The method is based upon the label transferring scheme, which includes label pool construction, bi-directional superpixel matching -nd label transferring stages. Compared to traditional parametric RGB-D scene parsing methods, the approach requires no tedious training stage, which makes it simple and efficient. In contrast to previous nonparametric techniques, our method not only incorporate geometric contexts at all the stages, but also propose a bi-directional scheme for superpixel matching in order to reduce mismatching. Then a collaborative representation based classification (CRC) mechanism was built for Markov random field (MRF), and parsing result was achieved through minimizing the energy function via Graph Cuts. The effectiveness of the approach was validated both on the indoor NYU Depth V1 dataset and the outdoor KITTI dataset. The approach outperformed both state-of-the-art RGB-D parsing techniques and a classical nonparametric superparsing method. The algorithm can be applied to different scenarios, having a strong practical value.
|
Published: 23 July 2016
|
|
基于马尔科夫随机场的非参数化RGB-D场景理解
针对RGB-D场景下的场景理解问题,提出高效的基于标签传递机制的非参数化场景理解算法.该算法主要分为标签源构建、超像素双向匹配和标签传递三个步骤.与传统的参数化RGB-D场景理解方法相比,该算法不需要繁琐的训练,具有简单高效的特点.与传统的非参数化场景理解方法不同,该算法在系统的各个设计环节都有效利用了深度图提供的三维信息,在超像素匹配环节提出双向匹配机制,以减少特征误匹配;构建基于协同表示分类(CRC)的马尔科夫随机场(MRF),用Graph Cuts方法求出最优解,获得场景图像每个像素的语义标签.该算法分别在室内的NYU-V1数据集和室外的KITTI数据集上进行实验.实验结果表明,与现有算法相比,该算法取得了显著的性能提升, 对室内、外场景均适用.
|
|
[1] Velodyne. Velodyne hdl64e [EB/OL]. \[20140610\]. http:∥velodynelidar.com/lidar/.
[2] Kinect. Microsoft kinect [EB/OL]. \[20140610\].http:∥www.microsoft.com/enus/kinectforwindows/develop/learn.aspx.
[3] 闫飞, 庄严, 王伟. 移动机器人基于多传感器信息融合的室外场景理解[J]. 控制理论与应用, 2011, 28(8):1093-1098.
YAN Fei, ZHUANG Yan, WANG Wei. Outdoor scene comprehension of mobile robot based on multisensor information fusion [J]. Control Theory and Applications, 2011, 28(8):1093-1098.
[4] 谭伦正, 夏利民, 夏胜平. 基于多级Sigmoid神经网络的城市交通场景理解[J]. 国防科技大学学报, 2012, 34(4): 1001-2486.
TAN Lunzheng, XIA Limin, XIA Shengping. Urban traffic scene understanding based on multilevel sigmoidal neural network [J]. Journal of National University of Defense Technology, 2012, 34(4): 1001-2486.
[5] SILBERMAN N, FERGUS R. Indoor scene segmentation using a structured light sensor [C]∥Proceedings of ICCV. Barcelona: IEEE, 2011: 601-608.
[6] REN Xiaofeng, BO Liefeng, FOX D. RGB(D) scene labeling: features and algorithms [C]∥Proceedings of CVPR. Providence: IEEE, 2012: 2759-2766.
[7] TORRALBA A, FERGUS R, FREEMAN W T. 80 million tiny images: a large dataset for nonparametric object and scene recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1958-1970.
[8] SHOTTON J, WINN J, ROTHER C, et al. Textonboost for image understanding: multiclass object recognition and segmentation by jointly modeling texture, layout, and context [J]. International Journal of Computer Vision, 2009, 81(1): 223.
[9] FARABET C, COUPRIE C, NAJMAN L, et al. Learning hierarchical features for scene labeling [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1915-1929.
[10] STURGESS P, ALAHARI K, LADICKY L, et al. Combining appearance and structure from motion features for road scene understanding [C] ∥ Proceedings of BMVC. London: BMVA, 2009.
[11] LIU Ce, YUEN J, TORRALBA A. Nonparametric scene parsing: label transfer via dense scene alignment [C]∥ Proceedings of CVPR. Miami: IEEE, 2009: 1972-1979.
[12] TIGHE J, LAZEBNIK S. Superparsing: scalable nonparametric image parsing with superpixels [C] ∥ Proceedings of ECCV. Heraklion: Springer, 2010: 352-365.
[13] YANG J, PRICE B, COHEN S, et al. Context driven scene parsing with attention to rare classes [C] ∥ Proceedings of CVPR. Columbus: IEEE, 2014.
[14] EIGEN D, FERGUS R. Nonparametric image parsing using adaptive neighbor sets [C] ∥ Proceedings of CVPR. Providence: IEEE, 2012: 2799-2806.
[15] ZHANG Lie, YANG Meng, FENG Xiangchu. Sparse representation or collaborative representation: which helps face recognition? [C] ∥ Proceedings of ICCV. Barcelona: IEEE, 2011: 471-478.
[16] BOYKOV Y, VEKSLER O, ZABIH R. Fast approximate energy minimization via graph cuts [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(11): 1222-1239.
[17] OLIVA A, TORRALBA A. Building the gist of a scene: the role of global image features in recognition [J]. Progress In Brain Research, 2006, 155: 23-36.
[18] LEVINSHTEIN A, STERE A, KUTULAKOS N K, et al. Turbopixels: fast superpixels using geometric flows [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(12): 2290-2297.
[19] BO Liefeng, REN Xiaofeng, FOX D. Kernel descriptors for visual recognition [C] ∥ NIPS. Vancouver: Neural Information Processing Systems Foundation, 2010: 244-252.
[20] GEIGER A, LENZ P, URTASUM R. Are we ready for autonomous driving? the KITTI vision benchmark suite [C] ∥ Proceedings of CVPR. Providence: IEEE, 2012: 3354-3361. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|