Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2020, Vol. 54 Issue (12): 2405-2413    DOI: 10.3785/j.issn.1008-973X.2020.12.015
    
Foreground segmentation under dynamic background based on self-updating co-occurrence pixel
Dong LIANG1(),Xin-yu LIU1,Jia-xing PAN1,Han SUN1,Wen-jun ZHOU2,Shun’ichi KANEKO2
1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautic, Nanjing 211100, China
2. Graduate School of Information Science and Technology, Hokkaido University, Sapporo 220-0004, Japan
Download: HTML     PDF(1089KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A new foreground segmentation method called self-updating co-occurrence pixel-block model (SU-CPB) was proposed to solve the problem of co-occurrence pixel-block model (CPB). The segmentation result of STAM was used as a reference, by introducing supervised spatio-temporal attention model (STAM) that has been trained in large-scale training data. Three methods including a pixel-block dynamic selection method, replacement of broken pairs and calculation of the foreground similarities were proposed. The pixel-block pairs were self-updated online with these methods, and the problem of the CPB model performance degradation caused by lack of updating capability was solved. The capability of foreground segmentation across scenes was possessed. Experimental results show that this method performs better than CPB model in all scenes, and is significantly better than STAM, CPB and other methods participating in comparison under the Wallflower and LIMU datasets without training by STAM.



Key wordsforeground segmentation      pixel spatial relation      spatio-temporal attention model (STAM)      online self-updating      cross-scene     
Received: 02 November 2019      Published: 31 December 2020
CLC:  TP 391  
Cite this article:

Dong LIANG,Xin-yu LIU,Jia-xing PAN,Han SUN,Wen-jun ZHOU,Shun’ichi KANEKO. Foreground segmentation under dynamic background based on self-updating co-occurrence pixel. Journal of ZheJiang University (Engineering Science), 2020, 54(12): 2405-2413.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.12.015     OR     http://www.zjujournals.com/eng/Y2020/V54/I12/2405


动态背景下基于自更新像素共现的前景分割

针对共现像素-支持块模型(CPB)存在的问题,提出一种新的自更新像素共现模型(SU-CPB). 引入经大规模监控场景训练的时空注意力模型(STAM),将STAM分割掩模作为指导,通过3种方法,包括像素-支持块对的动态选择,结构失效支持块的替换与前景相似度的计算,完成对支持块的在线自更新,解决CPB不具备更新能力带来的模型性能下降的问题,并使SU-CPB具备跨场景前景分割能力. 实验结果表明,该方法在所有测试场景下均优于CPB,并在未经STAM训练的Wallflower与LIMU数据集下,显著优于单纯的STAM、CPB以及其他参与对比的方法.


关键词: 前景分割,  像素空间关系,  时空注意力模型(STAM),  在线自更新,  跨场景 
Fig.1 Framework of SU-CPB
Fig.2 Working mechanism of CPB
Fig.3 STAM network structure
Fig.4 Dynamic selection model of supporting blocks
Fig.5 Demonstration of selection model
Fig.6 Replacement of broken pairs
参数 设置值
支持块数量 20
候选支持块数量 10
高斯模型阈值 2.5
相关性决策阈值 0.5
相似度决策阈值 0.8
Tab.1 Parameters setting of SU-CPB
序号 算法 F-measure
BDW BSL CJT DBG IOM SHD THM TBL LFR NVD PTZ
1 SU-CPB 0.867 0.907 0.853 0.924 0.760 0.910 0.969 0.895 0.449 0.558 0.753
2 CPB[17] 0.475 0.519 0.597 0.477 0.348 0.581 0.372 0.459 0.170 0.277 0.161
3 SuBSENSE[6] 0.862 0.950 0.815 0.818 0.657 0.865 0.817 0.779 0.645 0.560 0.348
4 KDE[3] 0.757 0.909 0.572 0.596 0.409 0.803 0.742 0.448 0.548 0.437 0.037
5 GMM[2] 0.738 0.825 0.597 0.633 0.521 0.732 0.662 0.466 0.537 0.410 0.152
6 BMOG[8] 0.784 0.830 0.749 0.793 0.529 0.840 0.635 0.693 0.610 0.498 0.235
7 SGSM-BS[11] 0.856 0.950 0.820 0.848 0.819 0.890 0.850 0.850 0.750 0.510 ?
8 STAM[22] 0.970 0.989 0.899 0.948 0.916 0.966 0.991 0.933 0.668 0.710 0.865
9 DeepBS[9] 0.830 0.958 0.899 0.876 0.610 0.930 0.758 0.846 0.600 0.584 0.313
10 CascadeCNN[12] 0.943 0.979 0.976 0.966 0.851 0.941 0.896 0.911 0.837 0.897 0.917
11 DPDL[13] 0.869 0.969 0.866 0.869 0.876 0.936 0.838 0.764 0.708 0.611 0.609
12 FgSegNet[14] 0.984 0.998 0.995 0.994 0.993 0.995 0.992 0.978 0.956 0.978 0.989
Tab.2 F-measure of different methods on CDNet2014
Fig.7 Comparison of detection in different complex scenarios
场景 F-measure
SU-CPB STAM[22]
PETS2006 0.957 0 0.956 3
traffic 0.835 0 0.834 9
fountain02 0.934 0 0.933 5
abandoned box 0.820 6 0.812 3
parking 0.764 1 0.763 3
Tab.3 Comparison of proposed method with STAM on specific training sets
场景 F-measure
SU-CPB STAM[22] DeepBS[9] Cascade CNN[12] FgSeg-Net[14] CPB[17] SuBSENSE[6] GMM[2] PBAS[32]
Bootstrap 0.756 0 0.741 4 0.747 9 0.523 8 0.358 7 0.651 8 0.419 2 0.530 6 0.285 7
Camouflage 0.688 4 0.736 9 0.985 7 0.677 8 0.121 0 0.611 2 0.953 5 0.830 7 0.892 2
Fg Aperture 0.942 0 0.829 2 0.658 3 0.793 5 0.411 9 0.590 0 0.663 5 0.577 8 0.645 9
Light Switch 0.909 7 0.909 0 0.611 4 0.588 3 0.681 5 0.715 7 0.320 1 0.229 6 0.221 2
Time of Day 0.794 9 0.342 9 0.549 4 0.377 1 0.422 2 0.756 4 0.710 7 0.720 3 0.487 5
Waving Trees 0.666 5 0.532 5 0.954 6 0.287 4 0.345 6 0.703 3 0.959 7 0.976 7 0.842 1
Overall 0.792 9 0.682 0 0.751 2 0.541 3 0.390 2 0.671 4 0.671 1 0.644 3 0.562 4
Tab.4 F-measure of different methods on Wallflower under different scenes
场景 Specifity
SU-CPB STAM[22] CascadeCNN[12] FgSegNet[14] CPB[17]
Moved Object 0.997 7 0.994 9 0.773 6 0.847 0 0.892 2
Tab.5 Specifity of different methods on Moved Object of Wallflower
场景 F-measure
SU-CPB STAM[22] CascadeCNN[12] FgSegNet[14] CPB[17]
Camera Parameter 0.748 4 0.674 2 0.102 5 0.266 8 0.654 5
Intersection 0.767 2 0.623 7 0.045 3 0.142 8 0.677 8
Light Switch 0.821 1 0.095 3 0.027 7 0.041 4 0.663 3
Overall 0.778 9 0.464 4 0.058 5 0.150 3 0.665 2
Tab.6 F-measure of different methods on LIMU under different scenes
Fig.8 Comparison of detection in different methods on different scenes of LIMU
场景 F-measure
CPB[17] CPBDT SU-CPB
Camera Parameter 0.654 5 0.715 9 0.748 4
Intersection 0.677 8 0.690 8 0.767 2
Light Switch 0.663 3 0.642 5 0.821 1
Overall 0.665 2 0.683 1 0.778 9
Tab.7 F-measure of SU-CPB method under different stage on different scenes of LIMU
[1]   VACAVANT A, CHATUAU T, WILHELM A, et al. A benchmark dataset for outdoor foreground/background extraction[C]// Asian Conference on Computer Vision. [S. l.]: Springer, 2012: 291-300.
[2]   STAUFFER C, GRIMSON W E L. Adaptive background mixture models for real-time tracking [C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition. [S. l.]: IEEE, 1999: 246-252.
[3]   ELGAMMAL A, DURAISWAMI R, HARWOOD D, et al Background and foreground modeling using nonparametric kernel density estimation for visual surveillance[J]. Proceedings of the IEEE, 2002, 90 (7): 1151- 1163
doi: 10.1109/JPROC.2002.801448
[4]   JODOIN P M, MIGNOTTE M, KONRAD J Statistical background subtraction using spatial cues[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2007, 17 (12): 1758- 1763
doi: 10.1109/TCSVT.2007.906935
[5]   BARNICH O, DROOGENBROECK M V ViBe: a universal background subtraction algorithm for video sequences[J]. IEEE Transactions on Image Processing, 2011, 20 (6): 1709- 1724
doi: 10.1109/TIP.2010.2101613
[6]   ST-CHARLES P L, BILODEAU G A, BERGEVIN R SuBSENSE: a universal change detection method with local adaptive sensitivity[J]. IEEE Transactions on Image Processing, 2014, 24 (1): 359- 373
[7]   LIANG D, KANEKO S, HASHIMOTO M, et al Co-occurrence probability-based pixel pairs background model for robust object detection in dynamic scenes[J]. Pattern Recognition, 2015, 48 (4): 1374- 1390
doi: 10.1016/j.patcog.2014.10.020
[8]   MARTINS I, CARVALHO P, CORTE-REAL, et al BMOG: boosted Gaussian mixture model with controlled complexity for background subtraction[J]. Pattern Analysis and Applications, 2018, 21 (3): 641- 654
doi: 10.1007/s10044-018-0699-y
[9]   BRAHAM M, DROOGENBROECK M V. Deep background subtraction with scene-specific convolutional neural networks [C]// 2016 International Conference on Systems, Signals and Image Processing. [S. l.]: IEEE, 2016.
[10]   BABAEE M, DINH D T, RIGOLL G. A deep convolutional neural network for background subtraction [EB/OL]. [2019-09-30]. https://arxiv.org/pdf/1702.01731.pdf.
[11]   SHI G, HUANG T, DONG W, et al Robust foreground estimation via structured gaussian scale mixture modeling[J]. IEEE Transactions on Image Processing, 2018, 27 (10): 4810- 4824
doi: 10.1109/TIP.2018.2845123
[12]   WANG Y, LUO Z, JODOIN P, et al Interactive deep learning method for segmenting moving objects[J]. Pattern Recognition Letters, 2017, 96: 66- 75
[13]   ZHAO C, CHAM T, REN X, et al. Background subtraction based on deep pixel distribution learning [C]// 2018 IEEE International Conference on Multimedia and Expo. [S. l.]: IEEE, 2018: 1-6.
[14]   LIM L A, KELES H Y Foreground segmentation using convolutional neural networks for multiscale feature encoding[J]. Pattern Recognition Letters, 2018, 112: 256- 262
doi: 10.1016/j.patrec.2018.08.002
[15]   LIM L A, KELES H Y Learning multi-scale features for foreground segmentation[J]. Pattern Analysis and Applications, 2019, 23 (3): 1369- 1380
[16]   QIU M, LI X A fully convolutional encoder-decoder spatial-temporal network for real-time background subtraction[J]. IEEE Access, 2019, 7: 85949- 85958
[17]   ZHOU W, KANEKO S, LIANG D, et al Background subtraction based on co-occurrence pixel-block pairs for robust object detection in dynamic scenes[J]. IIEEJ Transactions on Image Electronics and Visual Computing, 2018, 5 (2): 146- 159
[18]   ZHOU W, KANEKO S, HASHIMOTO M, et al. A co-occurrence background model with hypothesis on degradation modification for object detection in strong background changes [C]// 2018 24th International Conference on Pattern Recognition. [S. l.]: IEEE, 2018: 1743-1748.
[19]   ZHOU W, KANEKO S, HASHIMOTO M, et al Foreground detection based on co-occurrence background model with hypothesis on degradation modi?cation in dynamic scenes[J]. Signal Processing, 2019, 160: 66- 79
doi: 10.1016/j.sigpro.2019.02.021
[20]   ZHOU W, KANEKO S, SATOH Y, et al. Co-occurrence based foreground detection with hypothesis on degradation modification in severe imaging conditions [C] // Proceedings of JSPE Semestrial Meeting 2018 JSPE Autumn Conference. [S. l.]: JSPE, 2018: 624-625.
[21]   ZHAO X, SATOH Y, TAKAUJI H, et al Object detection based on a robust and accurate statistical multi-point-pair model[J]. Pattern Recognition, 2011, 44 (6): 1296- 1311
doi: 10.1016/j.patcog.2010.11.022
[22]   LIANG D, PAN J, SUN H, et al Spatio-temporal attention model for foreground detection in cross-scene surveillance videos[J]. Sensors, 2019, 19 (23): 5142
doi: 10.3390/s19235142
[23]   LAROCHELLE H, HINTON G. Learning to combine foveal glimpses with a third-order boltzmann machine [C]// Advances in Neural Information Processing Systems 23: Conference on Neural Information Processing Systems A Meeting Held December. [S. l.]: Curran Associates Inc, 2010: 1243–1251.
[24]   KIM J, LEE S, KWAK D, et al. Multimodal residual learning for visual QA [C]// Neural Information Processing Systems. [S. l.]: MIT Press, 2016: 361-369.
[25]   MNIH V, HEESS N, GRAVES A. Recurrent models of visual attention [C]// Neural Information Processing Systems. [S. l.]: MIT Press, 2014, 2: 2204-2212.
[26]   XU K, BA J, KIROS R, et al Show, attend and tell: neural image caption generation with visual attention[J]. International Conference on Machine Learning, 2015, 3: 2048- 2057
[27]   LI H, XIONG P, AN J, et al. Pyramid attention network for semantic segmentation [EB/OL]. [2019-09-30]. https://arxiv.org/pdf/1805.10180.pdf.
[28]   Liu C. Beyond pixels: exploring new representations and applications for motion analysis [D]. Cambridge: MIT, 2009.
[29]   GOYRTTE N, JODOIN P M, PORIKLI F, et al. Changedetection. net: a new change detection benchmark dataset [C]// 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. [S. l.]: IEEE, 2012: 1-8.
[30]   TOYAMMA K, KRUMM J, BRUMITT B, et al. Wallflower: principles and practice of background maintenance [C]// Proceedings of the Seventh IEEE International Conference on computer vision. [S. l.]: IEEE, 1999: 255-261.
[31]   Laboratory for image and media understanding [DB/OL]. [2019-09-30]. http://limu.ait.kyushu-u.ac.jp/dataset/en/.
[1] CHEN Cheng, PENG Huo-Ting, XIAO Dun. Video foreground segmentation with camera movement[J]. Journal of ZheJiang University (Engineering Science), 2009, 43(6): 975-977.