1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautic, Nanjing 211100, China 2. Graduate School of Information Science and Technology, Hokkaido University, Sapporo 220-0004, Japan
A new foreground segmentation method called self-updating co-occurrence pixel-block model (SU-CPB) was proposed to solve the problem of co-occurrence pixel-block model (CPB). The segmentation result of STAM was used as a reference, by introducing supervised spatio-temporal attention model (STAM) that has been trained in large-scale training data. Three methods including a pixel-block dynamic selection method, replacement of broken pairs and calculation of the foreground similarities were proposed. The pixel-block pairs were self-updated online with these methods, and the problem of the CPB model performance degradation caused by lack of updating capability was solved. The capability of foreground segmentation across scenes was possessed. Experimental results show that this method performs better than CPB model in all scenes, and is significantly better than STAM, CPB and other methods participating in comparison under the Wallflower and LIMU datasets without training by STAM.
Fig.7Comparison of detection in different complex scenarios
场景
F-measure
SU-CPB
STAM[22]
PETS2006
0.957 0
0.956 3
traffic
0.835 0
0.834 9
fountain02
0.934 0
0.933 5
abandoned box
0.820 6
0.812 3
parking
0.764 1
0.763 3
Tab.3Comparison of proposed method with STAM on specific training sets
场景
F-measure
SU-CPB
STAM[22]
DeepBS[9]
Cascade CNN[12]
FgSeg-Net[14]
CPB[17]
SuBSENSE[6]
GMM[2]
PBAS[32]
Bootstrap
0.756 0
0.741 4
0.747 9
0.523 8
0.358 7
0.651 8
0.419 2
0.530 6
0.285 7
Camouflage
0.688 4
0.736 9
0.985 7
0.677 8
0.121 0
0.611 2
0.953 5
0.830 7
0.892 2
Fg Aperture
0.942 0
0.829 2
0.658 3
0.793 5
0.411 9
0.590 0
0.663 5
0.577 8
0.645 9
Light Switch
0.909 7
0.909 0
0.611 4
0.588 3
0.681 5
0.715 7
0.320 1
0.229 6
0.221 2
Time of Day
0.794 9
0.342 9
0.549 4
0.377 1
0.422 2
0.756 4
0.710 7
0.720 3
0.487 5
Waving Trees
0.666 5
0.532 5
0.954 6
0.287 4
0.345 6
0.703 3
0.959 7
0.976 7
0.842 1
Overall
0.792 9
0.682 0
0.751 2
0.541 3
0.390 2
0.671 4
0.671 1
0.644 3
0.562 4
Tab.4F-measure of different methods on Wallflower under different scenes
场景
Specifity
SU-CPB
STAM[22]
CascadeCNN[12]
FgSegNet[14]
CPB[17]
Moved Object
0.997 7
0.994 9
0.773 6
0.847 0
0.892 2
Tab.5Specifity of different methods on Moved Object of Wallflower
场景
F-measure
SU-CPB
STAM[22]
CascadeCNN[12]
FgSegNet[14]
CPB[17]
Camera Parameter
0.748 4
0.674 2
0.102 5
0.266 8
0.654 5
Intersection
0.767 2
0.623 7
0.045 3
0.142 8
0.677 8
Light Switch
0.821 1
0.095 3
0.027 7
0.041 4
0.663 3
Overall
0.778 9
0.464 4
0.058 5
0.150 3
0.665 2
Tab.6F-measure of different methods on LIMU under different scenes
Fig.8Comparison of detection in different methods on different scenes of LIMU
场景
F-measure
CPB[17]
CPBDT
SU-CPB
Camera Parameter
0.654 5
0.715 9
0.748 4
Intersection
0.677 8
0.690 8
0.767 2
Light Switch
0.663 3
0.642 5
0.821 1
Overall
0.665 2
0.683 1
0.778 9
Tab.7F-measure of SU-CPB method under different stage on different scenes of LIMU
[1]
VACAVANT A, CHATUAU T, WILHELM A, et al. A benchmark dataset for outdoor foreground/background extraction[C]// Asian Conference on Computer Vision. [S. l.]: Springer, 2012: 291-300.
[2]
STAUFFER C, GRIMSON W E L. Adaptive background mixture models for real-time tracking [C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition. [S. l.]: IEEE, 1999: 246-252.
[3]
ELGAMMAL A, DURAISWAMI R, HARWOOD D, et al Background and foreground modeling using nonparametric kernel density estimation for visual surveillance[J]. Proceedings of the IEEE, 2002, 90 (7): 1151- 1163
doi: 10.1109/JPROC.2002.801448
[4]
JODOIN P M, MIGNOTTE M, KONRAD J Statistical background subtraction using spatial cues[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2007, 17 (12): 1758- 1763
doi: 10.1109/TCSVT.2007.906935
[5]
BARNICH O, DROOGENBROECK M V ViBe: a universal background subtraction algorithm for video sequences[J]. IEEE Transactions on Image Processing, 2011, 20 (6): 1709- 1724
doi: 10.1109/TIP.2010.2101613
[6]
ST-CHARLES P L, BILODEAU G A, BERGEVIN R SuBSENSE: a universal change detection method with local adaptive sensitivity[J]. IEEE Transactions on Image Processing, 2014, 24 (1): 359- 373
[7]
LIANG D, KANEKO S, HASHIMOTO M, et al Co-occurrence probability-based pixel pairs background model for robust object detection in dynamic scenes[J]. Pattern Recognition, 2015, 48 (4): 1374- 1390
doi: 10.1016/j.patcog.2014.10.020
[8]
MARTINS I, CARVALHO P, CORTE-REAL, et al BMOG: boosted Gaussian mixture model with controlled complexity for background subtraction[J]. Pattern Analysis and Applications, 2018, 21 (3): 641- 654
doi: 10.1007/s10044-018-0699-y
[9]
BRAHAM M, DROOGENBROECK M V. Deep background subtraction with scene-specific convolutional neural networks [C]// 2016 International Conference on Systems, Signals and Image Processing. [S. l.]: IEEE, 2016.
[10]
BABAEE M, DINH D T, RIGOLL G. A deep convolutional neural network for background subtraction [EB/OL]. [2019-09-30]. https://arxiv.org/pdf/1702.01731.pdf.
[11]
SHI G, HUANG T, DONG W, et al Robust foreground estimation via structured gaussian scale mixture modeling[J]. IEEE Transactions on Image Processing, 2018, 27 (10): 4810- 4824
doi: 10.1109/TIP.2018.2845123
[12]
WANG Y, LUO Z, JODOIN P, et al Interactive deep learning method for segmenting moving objects[J]. Pattern Recognition Letters, 2017, 96: 66- 75
[13]
ZHAO C, CHAM T, REN X, et al. Background subtraction based on deep pixel distribution learning [C]// 2018 IEEE International Conference on Multimedia and Expo. [S. l.]: IEEE, 2018: 1-6.
[14]
LIM L A, KELES H Y Foreground segmentation using convolutional neural networks for multiscale feature encoding[J]. Pattern Recognition Letters, 2018, 112: 256- 262
doi: 10.1016/j.patrec.2018.08.002
[15]
LIM L A, KELES H Y Learning multi-scale features for foreground segmentation[J]. Pattern Analysis and Applications, 2019, 23 (3): 1369- 1380
[16]
QIU M, LI X A fully convolutional encoder-decoder spatial-temporal network for real-time background subtraction[J]. IEEE Access, 2019, 7: 85949- 85958
[17]
ZHOU W, KANEKO S, LIANG D, et al Background subtraction based on co-occurrence pixel-block pairs for robust object detection in dynamic scenes[J]. IIEEJ Transactions on Image Electronics and Visual Computing, 2018, 5 (2): 146- 159
[18]
ZHOU W, KANEKO S, HASHIMOTO M, et al. A co-occurrence background model with hypothesis on degradation modification for object detection in strong background changes [C]// 2018 24th International Conference on Pattern Recognition. [S. l.]: IEEE, 2018: 1743-1748.
[19]
ZHOU W, KANEKO S, HASHIMOTO M, et al Foreground detection based on co-occurrence background model with hypothesis on degradation modi?cation in dynamic scenes[J]. Signal Processing, 2019, 160: 66- 79
doi: 10.1016/j.sigpro.2019.02.021
[20]
ZHOU W, KANEKO S, SATOH Y, et al. Co-occurrence based foreground detection with hypothesis on degradation modification in severe imaging conditions [C] // Proceedings of JSPE Semestrial Meeting 2018 JSPE Autumn Conference. [S. l.]: JSPE, 2018: 624-625.
[21]
ZHAO X, SATOH Y, TAKAUJI H, et al Object detection based on a robust and accurate statistical multi-point-pair model[J]. Pattern Recognition, 2011, 44 (6): 1296- 1311
doi: 10.1016/j.patcog.2010.11.022
[22]
LIANG D, PAN J, SUN H, et al Spatio-temporal attention model for foreground detection in cross-scene surveillance videos[J]. Sensors, 2019, 19 (23): 5142
doi: 10.3390/s19235142
[23]
LAROCHELLE H, HINTON G. Learning to combine foveal glimpses with a third-order boltzmann machine [C]// Advances in Neural Information Processing Systems 23: Conference on Neural Information Processing Systems A Meeting Held December. [S. l.]: Curran Associates Inc, 2010: 1243–1251.
[24]
KIM J, LEE S, KWAK D, et al. Multimodal residual learning for visual QA [C]// Neural Information Processing Systems. [S. l.]: MIT Press, 2016: 361-369.
[25]
MNIH V, HEESS N, GRAVES A. Recurrent models of visual attention [C]// Neural Information Processing Systems. [S. l.]: MIT Press, 2014, 2: 2204-2212.
[26]
XU K, BA J, KIROS R, et al Show, attend and tell: neural image caption generation with visual attention[J]. International Conference on Machine Learning, 2015, 3: 2048- 2057
[27]
LI H, XIONG P, AN J, et al. Pyramid attention network for semantic segmentation [EB/OL]. [2019-09-30]. https://arxiv.org/pdf/1805.10180.pdf.
[28]
Liu C. Beyond pixels: exploring new representations and applications for motion analysis [D]. Cambridge: MIT, 2009.
[29]
GOYRTTE N, JODOIN P M, PORIKLI F, et al. Changedetection. net: a new change detection benchmark dataset [C]// 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. [S. l.]: IEEE, 2012: 1-8.
[30]
TOYAMMA K, KRUMM J, BRUMITT B, et al. Wallflower: principles and practice of background maintenance [C]// Proceedings of the Seventh IEEE International Conference on computer vision. [S. l.]: IEEE, 1999: 255-261.
[31]
Laboratory for image and media understanding [DB/OL]. [2019-09-30]. http://limu.ait.kyushu-u.ac.jp/dataset/en/.