Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2015, Vol. 16 Issue (10): 817-828    DOI: 10.1631/FITEE.1500070
    
Beyond bag of latent topics: spatial pyramid matching for scene category recognition
Fu-xiang Lu, Jun Huang
School of Information Science & Engineering, Lanzhou University, Lanzhou 730000, China; Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  We propose a heterogeneous, mid-level feature based method for recognizing natural scene categories. The proposed feature introduces spatial information among the latent topics by means of spatial pyramid, while the latent topics are obtained by using probabilistic latent semantic analysis (pLSA) based on the bag-of-words representation. The proposed feature always performs better than standard pLSA because the performance of pLSA is adversely affected in many cases due to the loss of spatial information. By combining various interest point detectors and local region descriptors used in the bag-of-words model, the proposed feature can make further improvement for diverse scene category recognition tasks. We also propose a two-stage framework for multi-class classification. In the first stage, for each of possible detector/descriptor pairs, adaptive boosting classifiers are employed to select the most discriminative topics and further compute posterior probabilities of an unknown image from those selected topics. The second stage uses the prod-max rule to combine information coming from multiple sources and assigns the unknown image to the scene category with the highest ‘final’ posterior probability. Experimental results on three benchmark scene datasets show that the proposed method exceeds most state-of-the-art methods.

Key wordsScene category recognition      Probabilistic latent semantic analysis      Bag-of-words      Adaptive boosting     
Received: 07 March 2015      Published: 08 October 2015
CLC:  TP391.4  
Cite this article:

Fu-xiang Lu, Jun Huang. Beyond bag of latent topics: spatial pyramid matching for scene category recognition. Front. Inform. Technol. Electron. Eng., 2015, 16(10): 817-828.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/FITEE.1500070     OR     http://www.zjujournals.com/xueshu/fitee/Y2015/V16/I10/817


超越隐主题包模型:针对场景类别识别的空间金字塔匹配

目的:随着智能手机、数码相机的普及和互联网的高速发展,基于内容的场景类别识别对于图像数据库标注和检索具有重要意义。在场景类别数目比较多的情况下,本文基于概率隐语义分析(pLSA)和自适应提升(AdaBoost)算法,实现一种鲁棒的场景类别识别算法。
创新点:记录pLSA学习得到的主题的位置关系,提出金字塔主题直方图;在词包(bag-of-words)模型中采用不同的兴趣点提取算子和不同的局部区域描述符,实现异质金字塔单词直方图,显著提升场景识别准确率;提出一种两级多分类算法。
方法:利用期望最大化(EM)算法计算图像或图像块的pLSA主题分布,通过空间金字塔(SP)记录主题之间的大致位置关系;通过对兴趣点提取算子和区域描述符的比较研究,在词包模型中选用稠密兴趣点提取算子和六种区域描述符,从而得到六个金字塔主题直方图用来表示图像;为充分利用各异质金字塔主题直方图信息,先用AdaBoost选取分辨能力强的主题并计算测试图像的后验概率,再由prod-max融合规则确定测试图像的类别。
结论:对于特定的兴趣点提取算子和特定的区域描述符,金字塔主题直方图对所有基准图像库的场景识别率均高于标准pLSA主题直方图;融合各异质金字塔主题直方图显著提高了场景类别识别率。

关键词: 场景类别识别,  概率隐语义分析,  词包,  自适应提升 
[1] Yuan-ping Nie, Yi Han, Jiu-ming Huang, Bo Jiao, Ai-ping Li. Attention-based encoder-decoder model for answer selection in question answering[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 535-544.
[2] Rong-Feng Zhang , Ting Deng , Gui-Hong Wang , Jing-Lun Shi , Quan-Sheng Guan . A robust object tracking framework based on a reliable point assignment algorithm[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 545-558.
[3] Yue-ting Zhuang, Fei Wu, Chun Chen, Yun-he Pan. Challenges and opportunities: from big data to knowledge in AI 2.0[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(1): 3-14.
[4] Le-kui Zhou, Si-liang Tang, Jun Xiao, Fei Wu, Yue-ting Zhuang. Disambiguating named entities with deep supervised learning via crowd labels[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(1): 97-106.
[5] M. F. Kazemi, M. A. Pourmina, A. H. Mazinan. Level-direction decomposition analysis with a focus on image watermarking framework[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(11): 1199-1217.
[6] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. Two-level hierarchical feature learning for image classification[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.
[7] Jia-yin Song, Wen-long Song, Jian-ping Huang, Liang-kuan Zhu. Segmentation and focus-point location based on boundary analysis in forest canopy hemispherical photography[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(8): 741-749.
[8] Gao-li Sang, Hu Chen, Ge Huang, Qi-jun Zhao. Unseen head pose prediction using dense multivariate label distribution[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(6): 516-526.
[9] Xi-chuan Zhou, Fang Tang, Qin Li, Sheng-dong Hu, Guo-jun Li, Yun-jian Jia, Xin-ke Li, Yu-jie Feng. Global influenza surveillance with Laplacian multidimensional scaling[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(5): 413-421.
[10] Chu-hua Huang, Dong-ming Lu, Chang-yu Diao. A multiscale-contour-based interpolation framework for generating a time-varying quasi-dense point cloud sequence[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(5): 422-434.
[11] Xiao-hu Ma, Meng Yang, Zhao Zhang. Local uncorrelated local discriminant embedding for face recognition[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(3): 212-223.
[12] Yu Liu, Bo Zhu. Deformable image registration with geometric changes[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(10): 829-837.
[13] Zheng-wei Huang, Wen-tao Xue, Qi-rong Mao. Speech emotion recognition with unsupervised feature learning[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 358-366.
[14] Xun Liu, Yin Zhang, San-yuan Zhang, Ying Wang, Zhong-yan Liang, Xiu-zi Ye. Detection of engineering vehicles in high-resolution monitoring images[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 346-357.
[15] Xiao-fang Huang, Shou-qian Sun, Ke-jun Zhang, Tian-ning Xu, Jian-feng Wu, Bin Zhu. A method of shadow puppet figure modeling and animation[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 367-379.