基于生成对抗网络的图像恢复与SLAM容错研究

doi:10.3785/j.issn.1008-973X.2019.01.013

浙江大学学报(工学版)

2019, Vol. 53

Issue (1): 115-125 DOI: 10.3785/j.issn.1008-973X.2019.01.013

计算机技术

基于生成对抗网络的图像恢复与SLAM容错研究

王凯, 岳泊暄, 傅骏伟, 梁军

浙江大学控制科学与工程学院, 浙江杭州 310058

Image restoration and fault tolerance of stereo SLAM based on generative adversarial net

WANG Kai, YUE Bo-xuan, FU Jun-wei, LIANG Jun

College of Control Science and Engineering, Zhejiang University, Hangzhou 310058, China

全文: PDF(2045 KB) HTML

摘要：

为了提高即时定位与地图构建（SLAM）系统的容错能力，在经典图像生成网络Pix2Pix的基础上，逐步添加深度估计网络和深度信息的输入、基于STN网络的图像重建损失以及基于图像修复网络的图像补全损失3个方面的改进. 结合双目图像的耦合关系，通过挖掘和融合多种信息，增大了信息的利用率，提高了模型的图像生成效果. 提出将生成对抗网络（GAN）技术与SLAM容错场景相结合，直接实现了感知端的容错. 在KITTI和Cityscapes数据集上进行实验，验证了改进模型的有效性. 将模型生成的图像用于双目视觉系统的重建，验证了容错思想的可行性.

Abstract:

The classical Pix2Pix network was modified in order to promote the capacity of fault tolerance of simultaneous localization and mapping (SLAM) system. The network was gradually added to depth estimation network and its depth information, image reconstruction loss based on STN network and image inpainting loss based on image inpainting network. Information was mined based on the coupling of stereo images and merged to utilize information usage and promote model performance. Then generative adversarial net (GAN) and SLAM were combined, and the fault tolerance in the sensing level was directly realized. Experiments were performed on KITTI and Cityscapes dataset in order to prove the effectiveness of the improvement. The generated images and original images were both fed as inputs of stereo SLAM system. Results showed that the fault tolerance idea was approachable.

收稿日期: 2018-01-10 出版日期: 2019-01-07

CLC:

TP183

基金资助:

国家自然科学基金资助项目（U1664264，U1509203）

通讯作者: 梁军,男,教授.orcid.org/0000-0003-1115-0824. E-mail: jliang@zju.edu.cn

作者简介: 王凯(1993-),男,硕士生,从事计算机视觉的研究.orcid.org/0000-0002-4349-6486.E-mail:kaiwang1@zju.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章

引用本文:

王凯, 岳泊暄, 傅骏伟, 梁军. 基于生成对抗网络的图像恢复与SLAM容错研究[J]. 浙江大学学报(工学版), 2019, 53(1): 115-125.

WANG Kai, YUE Bo-xuan, FU Jun-wei, LIANG Jun. Image restoration and fault tolerance of stereo SLAM based on generative adversarial net. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2019, 53(1): 115-125.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2019.01.013 或 http://www.zjujournals.com/eng/CN/Y2019/V53/I1/115

[1] SHUM H, KANG S B. Review of image-based rendering techniques[C]//Visual Communications and Image Processing. Perth:International Society for Optics and Photonics, 2000, 4067:2-14.
[2] TATARCHENKO M, DOSOVITSKIY A, BROX T. Multi-view 3D models from single images with a convolutional network[J]. Knowledge and Information Systems, 2015, 38(1):231-257.
[3] DAVISON A J. Real-time simultaneous localisation and mapping with a single camera[C]//Proceedings Ninth IEEE International Conference on Computer Vision. Nice:IEEE, 2003:1403-1410.
[4] MUR-ARTAL R, MONTIEL J M M, TARDOS J D. ORB-SLAM:a versatile and accurate monocular SLAM system[J]. IEEE Transactions on Robotics, 2015, 31(5):1147-1163.
[5] DURRANTWHYTE H F, BAILEY T. Simultaneous localization and mapping[J]. IEEE Robotics Automat Mag, 2006, 13(3):108-117.
[6] LEMAIRE T, BERGER C, JUNG I K, et al. Vision-based SLAM:stereo and monocular approaches[J]. International Journal of Computer Vision, 2007, 74(3):343-364.
[7] JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems. Montreal:[s. n.], 2015:2017-2025.
[8] TATARCHENKO M, DOSOVITSKIY A, BROX T. Single-view to multi-view:reconstructing unseen views with a convolutional network[J]. Knowledge and Information Systems, 2015, 38(1):231-257.
[9] ZHAO B, WU X, CHENG Z Q, et al. Multi-view image generation from a single-view[C]//Proceedings of the 26th ACM International Conference on Multimedia. Seoul:ACM, 2018:383-391.
[10] ZHOU T, TULSIANI S, SUN W, et al. View synthesis by appearance flow[C]//European Conference on Computer Vision. Cham:Springer, 2016:286-301.
[11] PARK E, YANG J, YUMER E, et al. Transformation-grounded image generation network for novel 3d view synthesis[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu:IEEE, 2017:702-711.
[12] EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]//Advances in Neural Information Processing Systems. Montreal:MIT Press, 2014:2366-2374.
[13] SHI J, POLLEFEYS M. Pulling things out of perspective[C]//IEEE Conference on Computer Vision and Pattern Recognition. Ohio:IEEE, 2014:89-96.
[14] LIU F, SHEN C, LIN G, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10):2024-2039.
[15] ABRAMS A, HAWLEY C, PLESS R. Heliometric stereo:shape from sun position[C]//Computer Vision-ECCV 2012. Berlin:Springer, 2012:357-370.
[16] FURUKAWA Y, HERNÁNDEZ C. Multi-view stereo:a tutorial[J]. Foundations and Trends^® in Computer Graphics and Vision, 2015, 9(1/2):1-148.
[17] RANFTL R, VINEET V, CHEN Q, et al. Dense monocular depth estimation in complex dynamic scenes[C]//Computer Vision and Pattern Recognition. Las Vegas:IEEE, 2016.
[18] SCHARSTEIN D, SZELISKI R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. International Journal of Computer Vision, 2002, 47(1-3):7-42.
[19] WOODHAM R J. Photometric method for determining surface orientation from multiple images[J]. Optical Engineering, 1980, 19(1):1-22.
[20] GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]//Computer Vision and Pattern Recognition. Honolulu:IEEE, 2017, 2(6):7.
[21] ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu:IEEE, 2017:5967-5976.
[22] RONNEBERGER O, FISCHER P, BROX T. U-Net:convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham:Springer, 2015:234-241.
[23] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment:from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4):600-612.
[24] PATHAK D, KRAHENBUHL P, DONAHUE J, et al. Context encoders:feature learning by inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:IEEE, 2016:2536-2544.
[25] WANG Y, LI J, LU Y, et al. Image quality evaluation based on image weighted separating block peak signal to noise ratio[C]//International Conference on Neural Networks and Signal Processing. Nanjing:IEEE, 2003:994-997.

[1]	郭宝震, 左万利, 王英. 采用词向量注意力机制的双路卷积神经网络句子分类模型[J]. 浙江大学学报(工学版), 2018, 52(9): 1729-1737.
[2]	陈星宇, 黄善和, 何昊哲. 探测频率对多频声学测沙技术测量误差的影响[J]. 浙江大学学报(工学版), 2018, 52(2): 307-316.

Viewed

Full text

Abstract

Cited

Shared

Discussed