Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (12): 2489-2499    DOI: 10.3785/j.issn.1008-973X.2024.12.008
    
Multi-scale context-guided feature elimination for ancient tower image classification
Yuebo MENG1,2(),Bo WANG1,2,Guanghui LIU1,2
1. College of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710300, China
2. Key Laboratory of Construction Robots for Higher Education in Shaanxi Province
Download: HTML     PDF(1803KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A multi-scale context-guided feature elimination classification method was proposed, for resolving the problems of ambiguous discriminative feature localization and complex scene interference in the classification task of ancient tower building images. First, a feature extraction network with MogaNet as the core was constructed, and multi-scale feature fusion was combined to fully explore the image information. Next, a context information extractor was designed to utilize the semantic context of the network to align and filter more discriminative local features, enhancing the ability to capture detailed features. Then, a feature elimination strategy was proposed to suppress fuzzy class features and background noise interference, and a loss function was designed to constrain fuzzy feature elimination and classification prediction. At last, a Chinese ancient tower architecture image dataset was established to provide data to support research on complex backgrounds and fuzzy boundaries in the field of fine-grained image categorization. This method achieved 96.3% accuracy on the self-constructed ancient tower architecture dataset, and 92.4%, 95.3% and 94.6% accuracy on three fine-grained datasets, namely, CUB-200-2011, Stanford Cars and FGVC-Aircraft, respectively. The proposed method outperforms other comparison algorithms and enables accurate classification of images of ancient tower buildings.



Key wordsimage classification      contextual information      feature elimination      deep learning      feature fusion     
Received: 30 October 2023      Published: 25 November 2024
CLC:  TP 399  
Fund:  国家自然科学基金资助项目(52278125);陕西省重点研发计划资助项目(2021SF-429).
Cite this article:

Yuebo MENG,Bo WANG,Guanghui LIU. Multi-scale context-guided feature elimination for ancient tower image classification. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2489-2499.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.12.008     OR     https://www.zjujournals.com/eng/Y2024/V58/I12/2489


多尺度上下文引导特征消除的古塔图像分类

针对古塔建筑图像分类任务中难以准确定位判别性特征以及复杂场景干扰的问题,提出多尺度上下文引导特征消除的分类方法. 构建以MogaNet为核心的特征提取网络,结合多尺度的特征融合以充分挖掘图像信息;设计上下文信息提取器,利用网络的语义上下文来对齐和过滤更具判别性的局部特征,加强网络捕捉细节特征的能力;提出特征消除策略,抑制模糊类特征和背景噪声干扰,并设计损失函数来约束模糊类特征消除和分类预测;建立中国古塔建筑图像数据集,为细粒度图像分类领域内针对复杂背景和模糊边界的研究提供数据支撑. 实验结果表明,所提方法在自建的古塔建筑数据集上达到了96.3%的准确率,并在CUB-200-2011、Stanford Cars和FGVC-Aircraft这3个细粒度数据集上分别达到了92.4%、95.3%和94.6%的准确率,优于其他对比算法,可以实现古塔建筑图像的精确分类.


关键词: 图像分类,  上下文信息,  特征消除,  深度学习,  特征融合 
Fig.1 Overall framework of multi-scale contextual feature elimination network
Fig.2 Framework of single-layer contextual information extractor
Fig.3 Framework of single-layer feature elimination strategy
Fig.4 Image examples of ancient tower architecture dataset
Fig.5 Ancient tower building dataset distribution
数据集$ {{T}_{{\text{train}}}} $$ {{T}_{{\text{test}}}} $$ {{T}_{{\text{lable}}}} $
古塔数据集7 6427 511263
CUB-200-20115 9945 794200
Stanford Cars1448 041196
FGVC-Aircraft6 6673 333100
Tab.1 Parameters of four datasets
$\xi $$\tau $P/%
[1600, 400, 100, 25][128, 32, 8, 2]95.61
[256, 64, 16, 4]95.62
[512, 128, 32, 8]95.42
[2048, 512, 128, 32][128, 32, 8, 2]96.04
[256, 64, 16, 4]96.10
[512, 128, 32, 8]96.22
[2304, 576, 144, 36][128, 32, 8, 2]96.28
[256, 64, 16, 4]96.32
[512, 128, 32, 8]96.30
[3136, 784, 196, 49][128, 32, 8, 2]95.85
[256, 64, 16, 4]95.96
[512, 128, 32, 8]95.11
Tab.2 Experimental results of different numbers of feature number sets on ancient tower dataset
损失函数P/%
${{L} _{{\text{final}}}}$95.25
${{L} _{{\text{final}}}}$+${{L} _{{\text{back}}}}$95.69
${{L} _{{\text{final}}}}$+${{L} _{{\text{joint}}}}$95.38
${{L} _{{\text{final}}}}$+${{L} _{{\text{drop}}}}$95.85
${{L} _{{\text{final}}}}$+${{L} _{{\text{back}}}}$+${{L} _{{\text{joint}}}}$96.12
${{L} _{{\text{final}}}}$+${{L} _{{\text{back}}}}$+${{L} _{{\text{joint}}}}$+${{L} _{{\text{drop}}}}$96.32
Tab.3 Experimental results of different loss functions on acient tower dataset
方法P/%Para/MCal/G
MogaNet-L93.9082.515.9
MogaNet-L+CIE95.35(+1.45)82.5(+0)17.5(+1.6)
MogaNet-L+FES94.93(+1.03)100.8(+18.3)30.3(+14.4)
MogaNet-L+CIE+FES96.32(+2.42)100.8(+18.3)31.9(+16.0)
Tab.4 Results of ablation experiments of proposed method on self-built ancient tower dataset
方法主干网络分辨率P/%
PMG[7]Resnet50448×44894.2
FBSD[8]Densenet161448×44894.5
TransFG[13]ViT-B_16448×44894.5
FFVT[12]ViT-B_16448×44894.8
SIM-Trans[14]ViT-B_16448×44894.8
CAP[4]Xception224×22494.9
ViT-SAC[16]ViT-B_16448×44895.4
SR-GNN[19]Xception224×22495.6
DCAL[20]R50-ViT-Base448×44895.7
本研究算法MogaNet-L224×22496.3
Tab.5 Accuracy comparison of different methods on ancient tower dataset
方法分辨率Para/MCal/GP/%
TransFG[13]448×44886.262.095.4
Vit-SAC[15]448×448106.092.595.6
DCAL[20]384×38488.047.095.7
本研究算法224×224100.831.996.3
Tab.6 Comparison of parameters numbers and calculation volume of different methods
Fig.6 Visualization comparison figure
方法主干网络分辨率P/%
CUB-200-2011Stanford CarsAircraft
WS-DAN[5]Inception v3448×44889.494.593.0
PMG[7]ResNet-50550×55089.695.193.4
API-Net[3]DenseNet-161512×51290.095.393.9
PART[21]ResNet-101448×44890.195.394.6
CAL[9]ResNet-101448×44890.695.594.2
FFVT[12]ViT-B_16448×44891.694.194.3
TransFG[13]ViT-B_16448×44891.794.894.1
CAP[4]Xception224×22491.895.794.5
ViT-SAC[16]ViT-B_16448×44891.895.093.1
DCAL[20]R50-ViT-Base448×44892.095.393.3
本研究方法MogaNet-L224×22492.495.394.6
Tab.7 Comparison of accuracy of different algorithms on fine-grained datasets
[1]   CAO B, ARAUJO A, SIM J. Unifying deep local and global features for image search [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 726−743.
[2]   DOU Z, CUI H, ZHANG L, et al. Learning global and local consistent representations for unsupervised image retrieval via deep graph diffusion networks [EB/OL]. (2020-06-11)[2023-08-22]. https://arxiv.org/abs/2001.01284.
[3]   ZHUANG P, WANG Y, QIAO Y. Learning attentive pairwise interaction for fine-grained classification [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2020: 13130−13137.
[4]   BEHERA A, WHARTON Z, HEWAGE P R P G, et al. Context-aware attentional pooling for fine-grained visual classification [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2021: 929−937.
[5]   HU T, QI H, HUANG Q, et al. See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification [EB/OL]. (2019-03-23)[2023-8-22]. https://arxiv.org/abs/1901.09891.
[6]   王波, 黄冕, 刘利军, 等 基于多层聚焦Inception-V3卷积网络的细粒度图像分类[J]. 电子学报, 2022, 50 (1): 72- 78
WANG Bo, HUANG Mian, LIU Lijun, et al Multi-layer focused Inception-V3 models for fine-Grained visual recognition[J]. Acta Electronica Sinica, 2022, 50 (1): 72- 78
[7]   DU R, CHANG D, BHUNIA A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 726−743.
[8]   SONG J, YANG R. Feature boosting, suppression, and diversification for fine-grained visual classification [C]// 2021 International Joint Conference on Neural Networks . Shenzhen: IEEE, 2021: 1−8.
[9]   RAO Y, CHEN G, LU J, et al. Counterfactual attention learning for fine-grained visual categorization and reidentification [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1025−1034.
[10]   DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [C]// Proceedings of the International Conference on Learning Representations . Washington DC: ICLR, 2021: 1−22.
[11]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . New York: Curran Associates, 2017: 6000−6010.
[12]   WANG J, YU X, GAO Y. Feature fusion vision transformer for fine-grained visual categorization [C]// Proceedings of the British Machine Vision Conference . Durham: BMVA, 2021: 685−698.
[13]   HE J, CHEN J N, LIU S, et al. Transfg: a transformer architecture for fine-grained recognition [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver: AAAI, 2022: 852−860.
[14]   SUN H, HE X, PENG Y. Sim-trans: structure information modeling transformer for fine-grained visual categorization [C]// Proceedings of the 30th ACM International Conference on Multimedia . Ottawa: ACM, 2022: 5853−5861.
[15]   LI S, WANG Z, LIU Z, et al. Efficient multi-order gated aggregation network [EB/OL]. (2023-03-20)[2023-8-22]. https://arxiv.org/abs/2211.03295.
[16]   DO T, TRAN H, TJIPUTRA E, et al. Fine-grained visual classification using self assessment classifier [EB/OL]. (2022-05-21)[2023-8-22]. https://arxiv.org/abs/2205.10529.
[17]   PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing . Doha: ACL, 2014: 1532−1543.
[18]   KIM J H, JUN J, ZHANG B T. Bilinear attention networks [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . New York: Curran Associates, 2018: 1571−1581.
[19]   BERA A, WHARTON Z, LIU Y, et al Sr-gnn: spatial relation-aware graph neural network for fine-grained image categorization[J]. IEEE Transactions on Image Processing, 2022, 31: 6017- 6031
doi: 10.1109/TIP.2022.3205215
[20]   ZHU H, KE W, LI D, et al. Dual cross-attention learning for fine-grained visual categorization and object reidentification [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 4692−4702.
[1] Huan LIU,Yunhong LI,Leitao ZHANG,Yue GUO,Xueping SU,Yaolin ZHU,Lele HOU. Identification of apple leaf diseases based on MA-ConvNext network and stepwise relational knowledge distillation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1757-1767.
[2] Fan LI,Jie YANG,Zhicheng FENG,Zhichao CHEN,Yunxiao FU. Pantograph-catenary contact point detection method based on image recognition[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1801-1810.
[3] Li XIAO,Zhigang CAO,Haoran LU,Zhijian HUANG,Yuanqiang CAI. Elastic metamaterial design based on deep learning and gradient optimization[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1892-1901.
[4] Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.
[5] Linrui LI,Dongsheng WANG,Hongjie FAN. Fact-based similar case retrieval methods based on statutory knowledge[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1357-1365.
[6] Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.
[7] Juan SONG,Longxi HE,Huiping LONG. Deep learning-based algorithm for multi defect detection in tunnel lining[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1161-1173.
[8] Yi LIU,Yidan CHEN,Lin GAO,Jiao HONG. Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 951-959.
[9] Cuiting WEI,Weijian ZHAO,Bochao SUN,Yunyi LIU. Intelligent rebar inspection based on improved Mask R-CNN and stereo vision[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1009-1019.
[10] Bo ZHONG,Pengfei WANG,Yiqiao WANG,Xiaoling WANG. Survey of deep learning based EEG data analysis technology[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 879-890.
[11] Yin CAO,Junping QIN,Tong GAO,Qianli MA,Jiaqi REN. Generative adversarial network based two-stage generation of high-quality images from text[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 674-683.
[12] Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.
[13] Xianglong LUO,Yafei WANG,Yanbo WANG,Lixin WANG. Structural deformation prediction of monitoring data based on bi-directional gate board learning system[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 729-736.
[14] Huijuan ZHANG,Kunpeng LI,Miaoxin JI,Zhenjiang LIU,Jianjuan LIU,Chi ZHANG. UAV detection algorithm based on spatial correlation enhancement[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 468-479.
[15] Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU. Light-weight algorithm for real-time robotic grasp detection[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 599-610.