Please wait a minute...
浙江大学学报(理学版)  2021, Vol. 48 Issue (3): 282-288    DOI: 10.3785/j.issn.1008-9497.2021.03.003
图像处理算法     
基于图像深度预测的景深视频分类算法
钱立辉1, 王斌1, 郑云飞2, 章佳杰2, 李马丁2, 于冰2
1.清华大学 软件学院,北京 100084
2.北京快手科技有限公司,北京 100085
Depth of field videos classification based on image depth prediction
QIAN Lihui1, WANG Bin1, ZHENG Yunfei2, ZHANG Jiajie2, LI Mading2, YU Bing2
1.School of Software, Tsinghua University, Beijing 100084, China
2.Beijing Kuaishou Technology Co., Ltd., Beijing 100085, China
 全文: PDF(2368 KB)   HTML  
摘要: 景深视频因高清、美观广受大众喜爱。然而,要从海量视频中检出此类视频十分困难。已有较多研究基于景深图像成像原理,开展景深像素分割算法研究,但难以直接应用于实际视频分类场景。本文针对景深视频类型,设计了可预测视频类型的深度网络。根据景深成像原理,各语义物体之间相对相机的景深深度存在一定的逻辑关系。为此提出以图像深度为指导,利用深度预测模块预测图像的景深深度信息,将其合并后输入至分类网络进行训练检测,以降低景深视频误检率,提升网络模型的性能。此外,针对现实需求中该领域有标数据较少,而不同数据集分布会降低性能的问题,设计了迭代式景深视频数据集收集方法,以较低的劳动成本快速收集所需要的视频数据,具有一定的实际应用价值。本文算法在快手线上的景深视频数据集中识别准确率达85.7%。
关键词: 视频分类深度预测深度学习景深    
Abstract: Depth of field videos are usually beautiful and are very popular.However,it is a problem to classify such videos.There are many research works on the principle of the depth of field and segmentation algorithms,but they are often difficult to be applied to real video classification scenarios.This paper proposes a deep classification network to directly classify a video based on the observation that semantic objects with different depths of field usually have different definitions.According to the depth of field imaging principle,we propose to use the depth map as a guide to reduce the false detection rate and then improve the performance of the network.Furthermore,we design an iterative method to collect the depth of field videos quickly at low cost. Experimental results show that our method outperforms the previous methods,reaching 85.7% in the Kuaishou depth of field videos dataset.
Key words: videos classification    deep learning    depth of field    depth prediction
收稿日期: 2020-09-26 出版日期: 2021-05-20
CLC:  TP 391.4  
基金资助: 国家自然科学基金资助项目(62072271).
通讯作者: ORCID:https://orcid.org/0000-0002-5176-9202,E-mail:wangbins@tsinghua.edu.cn.     E-mail: wangbins@tsinghua.edu.cn
作者简介: 钱立辉(1995—),ORCID:https://orcid.org/0000-0003-4100-1619,男,硕士研究生,主要从事计算机视觉研;
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
钱立辉
王斌
郑云飞
章佳杰
李马丁
于冰

引用本文:

钱立辉, 王斌, 郑云飞, 章佳杰, 李马丁, 于冰. 基于图像深度预测的景深视频分类算法[J]. 浙江大学学报(理学版), 2021, 48(3): 282-288.

QIAN Lihui, WANG Bin, ZHENG Yunfei, ZHANG Jiajie, LI Mading, YU Bing. Depth of field videos classification based on image depth prediction. Journal of Zhejiang University (Science Edition), 2021, 48(3): 282-288.

链接本文:

https://www.zjujournals.com/sci/CN/10.3785/j.issn.1008-9497.2021.03.003        https://www.zjujournals.com/sci/CN/Y2021/V48/I3/282

1 KIM B,SON H,PARK S J,et al. Defocus and motion blur detection with deep contextual features[J]. Computer Graphics Forum, 2018,37(7):277-288.DOI 10.1111/cgf.13567
2 ZHANG S,SHEN X,LIN Z,et al.Learning to understand image blur[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City:IEEE,2018:6586-6595. DOI:10.1109/cvpr.2018.00689
3 WANG L,SHEN X,ZHANG J,et al.DeepLens:Shallow depth of field from a single image[J].ACM Transactions on Graphics, 2018,37(6):1-11.
4 HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778. DOI:10.1109/cvpr.2016.90
5 邹艺方,王艳平.摄像机景深及其运用[J].影像技术,2008,21(3):14-17. ZHOU Y F,WANG Y P. Camcorder field depth and its utilization[J]. Image Technology,2008,21(3):14-17.
6 ICHIGAYA A,KUROZUMI M,HARA N,et al. A method of estimating coding PSNR using quantized DCT coefficients[J]. IEEE Transactions on Circuits and Systems for Video Technology,2006,16(2):251-259. DOI:10.1109/tcsvt.2005.858745
7 WANG Z,SIMONCELLI E P,BOVIK A C.Multiscale structural similarity for image quality assessment[C]//Proceedings of Asilomar Conference on Signals,Systems & Computers.California:IEEE, 2003:1398-1402. DOI:10.1109/acssc.2003.1292216
8 VU P V,CHANDLER D M. Vis3:An algorithm for video quality assessment via analysis of spatial and spatiotemporal slices[J].Journal of Electronic Imaging,2014,23(1):013016. DOI:10.1117/1.jei.23.1.013016
9 MANASA K,CHANNAPPAYYA S S.An optical flow-based no-reference video quality assessment algorithm[C]//Proceedings of the IEEE International Conference on Image Processing. Phoenix:IEEE,2016:2400-2404. DOI:10.1109/icip.2016.7532789
10 SAAD M,BOVIK A C,CHARRIER C. Blind prediction of natural video quality and h.264 applications[J]. IEEE Transactions on Image Processing,2013,23(3):1352-1365. DOI:10.1109/TIP.2014.2299154
11 SHI J,XU L,JIA J.Just noticeable defocus blur detection and estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston:IEEE Computer Society,2015:657-665. DOI:10.1109/cvpr.2015.7298665
12 YI X,ERAMIAN M.LBP-based segmentation of defocus blur[J].IEEE Transactions on Image Processing,2016,25(4):1626-1638. DOI:10.1109/tip.2016.2528042
13 MA K,FU H,LIU T,et al.Deep blur mapping:Exploiting high-level semantics by deep neural networks[J].IEEE Transactions on Image Processing,2018,27(10):5155-5166. DOI:10.1109/tip.2018.2847421
14 RONNEBERGER O,FISCHER P,BROX T. U-net:Convolutional net-works for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-assisted Intervention.Springer:Cham,2015:234-241. DOI:10.1007/978-3-319-24574-4_28
15 CARUANA R,LAWRENCE S,GILES C L.Overfitting in neural nets:Backpropagation,conjugate gradient,and early stopping[C]//NIPS'00:Proceedings of the 13th International Conference on Neural Information Processing Systems. Denver:NIPS Foundation,2000.
16 KINGMA D P,BA J. Adam:A method for stochastic optimization[EB/OL].(2014-12-22).https://arXiv.org/abs/1412.6980v9
[1] 徐圣嘉,苏程,朱孔阳,章孝灿. 基于深度学习的岩石薄片矿物自动识别方法[J]. 浙江大学学报(理学版), 2022, 49(6): 743-752.
[2] 刘华玲,张国祥,马俊. 图嵌入算法研究进展[J]. 浙江大学学报(理学版), 2022, 49(4): 443-456.
[3] 陈园琼, 邹北骥, 张美华, 廖望旻, 黄嘉儿, 朱承璋. 医学影像处理的深度学习可解释性研究进展[J]. 浙江大学学报(理学版), 2021, 48(1): 18-29.
[4] 傅颖颖, 张丰, 杜震洪, 刘仁义. 融合图卷积神经网络和注意力机制的PM2.5小时浓度多步预测[J]. 浙江大学学报(理学版), 2021, 48(1): 74-83.
[5] 李君轶, 任涛, 陆路正. 游客情感计算的文本大数据挖掘方法比较研究[J]. 浙江大学学报(理学版), 2020, 47(4): 507-520.
[6] 陈善雄, 王小龙, 韩旭, 刘云, 王明贵. 一种基于深度学习的古彝文识别方法[J]. 浙江大学学报(理学版), 2019, 46(3): 261-269.
[7] 黄婕, 张丰, 杜震洪, 刘仁义, 曹晓裴. 基于RNN-CNN集成深度学习模型的PM2.5小时浓度预测[J]. 浙江大学学报(理学版), 2019, 46(3): 370-379.
[8] 胡伟俭, 陈为, 冯浩哲, 张天平, 朱正茂, 潘巧明. 应用于平扫CT图像肺结节检测的深度学习方法综述[J]. 浙江大学学报(理学版), 2017, 44(4): 379-384.