Multi-scale context-guided feature elimination for ancient tower image classification

doi:10.3785/j.issn.1008-973X.2024.12.008

Journal of ZheJiang University (Engineering Science)

2024, Vol. 58

Issue (12): 2489-2499 DOI: 10.3785/j.issn.1008-973X.2024.12.008

Multi-scale context-guided feature elimination for ancient tower image classification

Yuebo MENG1,2(

),Bo WANG1,2,Guanghui LIU1,2

1. College of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710300, China
2. Key Laboratory of Construction Robots for Higher Education in Shaanxi Province

Download:

HTML

PDF(1803KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A multi-scale context-guided feature elimination classification method was proposed, for resolving the problems of ambiguous discriminative feature localization and complex scene interference in the classification task of ancient tower building images. First, a feature extraction network with MogaNet as the core was constructed, and multi-scale feature fusion was combined to fully explore the image information. Next, a context information extractor was designed to utilize the semantic context of the network to align and filter more discriminative local features, enhancing the ability to capture detailed features. Then, a feature elimination strategy was proposed to suppress fuzzy class features and background noise interference, and a loss function was designed to constrain fuzzy feature elimination and classification prediction. At last, a Chinese ancient tower architecture image dataset was established to provide data to support research on complex backgrounds and fuzzy boundaries in the field of fine-grained image categorization. This method achieved 96.3% accuracy on the self-constructed ancient tower architecture dataset, and 92.4%, 95.3% and 94.6% accuracy on three fine-grained datasets, namely, CUB-200-2011, Stanford Cars and FGVC-Aircraft, respectively. The proposed method outperforms other comparison algorithms and enables accurate classification of images of ancient tower buildings.

Key words： image classification contextual information feature elimination deep learning feature fusion

Received: 30 October 2023 Published: 25 November 2024

CLC:

TP 399

Fund: 国家自然科学基金资助项目（52278125）；陕西省重点研发计划资助项目（2021SF-429）.

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Yuebo MENG
	Bo WANG
	Guanghui LIU

Cite this article:

Yuebo MENG,Bo WANG,Guanghui LIU. Multi-scale context-guided feature elimination for ancient tower image classification. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2489-2499.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.12.008 OR https://www.zjujournals.com/eng/Y2024/V58/I12/2489

多尺度上下文引导特征消除的古塔图像分类

针对古塔建筑图像分类任务中难以准确定位判别性特征以及复杂场景干扰的问题，提出多尺度上下文引导特征消除的分类方法. 构建以MogaNet为核心的特征提取网络，结合多尺度的特征融合以充分挖掘图像信息；设计上下文信息提取器，利用网络的语义上下文来对齐和过滤更具判别性的局部特征，加强网络捕捉细节特征的能力；提出特征消除策略，抑制模糊类特征和背景噪声干扰，并设计损失函数来约束模糊类特征消除和分类预测；建立中国古塔建筑图像数据集，为细粒度图像分类领域内针对复杂背景和模糊边界的研究提供数据支撑. 实验结果表明，所提方法在自建的古塔建筑数据集上达到了96.3%的准确率，并在CUB-200-2011、Stanford Cars和FGVC-Aircraft这3个细粒度数据集上分别达到了92.4%、95.3%和94.6%的准确率，优于其他对比算法，可以实现古塔建筑图像的精确分类.

关键词： 图像分类, 上下文信息, 特征消除, 深度学习, 特征融合

Fig.1 Overall framework of multi-scale contextual feature elimination network

Fig.2 Framework of single-layer contextual information extractor

Fig.3 Framework of single-layer feature elimination strategy

Fig.4 Image examples of ancient tower architecture dataset

Fig.5 Ancient tower building dataset distribution

Tab.1 Parameters of four datasets

Tab.2 Experimental results of different numbers of feature number sets on ancient tower dataset

Tab.3 Experimental results of different loss functions on acient tower dataset

Tab.4 Results of ablation experiments of proposed method on self-built ancient tower dataset

Tab.5 Accuracy comparison of different methods on ancient tower dataset

Tab.6 Comparison of parameters numbers and calculation volume of different methods

Fig.6 Visualization comparison figure

Tab.7 Comparison of accuracy of different algorithms on fine-grained datasets


[1]	CAO B, ARAUJO A, SIM J. Unifying deep local and global features for image search [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 726−743.

[2]	DOU Z, CUI H, ZHANG L, et al. Learning global and local consistent representations for unsupervised image retrieval via deep graph diffusion networks [EB/OL]. (2020-06-11)[2023-08-22]. https://arxiv.org/abs/2001.01284.

[3]	ZHUANG P, WANG Y, QIAO Y. Learning attentive pairwise interaction for fine-grained classification [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2020: 13130−13137.

[4]	BEHERA A, WHARTON Z, HEWAGE P R P G, et al. Context-aware attentional pooling for fine-grained visual classification [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2021: 929−937.

[5]	HU T, QI H, HUANG Q, et al. See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification [EB/OL]. (2019-03-23)[2023-8-22]. https://arxiv.org/abs/1901.09891.

[6]	王波, 黄冕, 刘利军, 等基于多层聚焦Inception-V3卷积网络的细粒度图像分类[J]. 电子学报, 2022, 50 (1): 72- 78 WANG Bo, HUANG Mian, LIU Lijun, et al Multi-layer focused Inception-V3 models for fine-Grained visual recognition[J]. Acta Electronica Sinica, 2022, 50 (1): 72- 78

[7]	DU R, CHANG D, BHUNIA A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 726−743.

[8]	SONG J, YANG R. Feature boosting, suppression, and diversification for fine-grained visual classification [C]// 2021 International Joint Conference on Neural Networks . Shenzhen: IEEE, 2021: 1−8.

[9]	RAO Y, CHEN G, LU J, et al. Counterfactual attention learning for fine-grained visual categorization and reidentification [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1025−1034.

[10]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [C]// Proceedings of the International Conference on Learning Representations . Washington DC: ICLR, 2021: 1−22.

[11]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . New York: Curran Associates, 2017: 6000−6010.

[12]	WANG J, YU X, GAO Y. Feature fusion vision transformer for fine-grained visual categorization [C]// Proceedings of the British Machine Vision Conference . Durham: BMVA, 2021: 685−698.

[13]	HE J, CHEN J N, LIU S, et al. Transfg: a transformer architecture for fine-grained recognition [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver: AAAI, 2022: 852−860.

[14]	SUN H, HE X, PENG Y. Sim-trans: structure information modeling transformer for fine-grained visual categorization [C]// Proceedings of the 30th ACM International Conference on Multimedia . Ottawa: ACM, 2022: 5853−5861.

[15]	LI S, WANG Z, LIU Z, et al. Efficient multi-order gated aggregation network [EB/OL]. (2023-03-20)[2023-8-22]. https://arxiv.org/abs/2211.03295.

[16]	DO T, TRAN H, TJIPUTRA E, et al. Fine-grained visual classification using self assessment classifier [EB/OL]. (2022-05-21)[2023-8-22]. https://arxiv.org/abs/2205.10529.

[17]	PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing . Doha: ACL, 2014: 1532−1543.

[18]	KIM J H, JUN J, ZHANG B T. Bilinear attention networks [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . New York: Curran Associates, 2018: 1571−1581.

[19]	BERA A, WHARTON Z, LIU Y, et al Sr-gnn: spatial relation-aware graph neural network for fine-grained image categorization[J]. IEEE Transactions on Image Processing, 2022, 31: 6017- 6031 doi: 10.1109/TIP.2022.3205215

[20]	ZHU H, KE W, LI D, et al. Dual cross-attention learning for fine-grained visual categorization and object reidentification [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 4692−4702.

[1]	Huan LIU,Yunhong LI,Leitao ZHANG,Yue GUO,Xueping SU,Yaolin ZHU,Lele HOU. Identification of apple leaf diseases based on MA-ConvNext network and stepwise relational knowledge distillation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1757-1767.

[2]	Fan LI,Jie YANG,Zhicheng FENG,Zhichao CHEN,Yunxiao FU. Pantograph-catenary contact point detection method based on image recognition[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1801-1810.

[3]	Li XIAO,Zhigang CAO,Haoran LU,Zhijian HUANG,Yuanqiang CAI. Elastic metamaterial design based on deep learning and gradient optimization[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1892-1901.

[4]	Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.

[5]	Linrui LI,Dongsheng WANG,Hongjie FAN. Fact-based similar case retrieval methods based on statutory knowledge[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1357-1365.

[6]	Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.

[7]	Juan SONG,Longxi HE,Huiping LONG. Deep learning-based algorithm for multi defect detection in tunnel lining[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1161-1173.

[8]	Yi LIU,Yidan CHEN,Lin GAO,Jiao HONG. Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 951-959.

[9]	Cuiting WEI,Weijian ZHAO,Bochao SUN,Yunyi LIU. Intelligent rebar inspection based on improved Mask R-CNN and stereo vision[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1009-1019.

[10]	Bo ZHONG,Pengfei WANG,Yiqiao WANG,Xiaoling WANG. Survey of deep learning based EEG data analysis technology[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 879-890.

[11]	Yin CAO,Junping QIN,Tong GAO,Qianli MA,Jiaqi REN. Generative adversarial network based two-stage generation of high-quality images from text[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 674-683.

[12]	Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.

[13]	Xianglong LUO,Yafei WANG,Yanbo WANG,Lixin WANG. Structural deformation prediction of monitoring data based on bi-directional gate board learning system[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 729-736.

[14]	Huijuan ZHANG,Kunpeng LI,Miaoxin JI,Zhenjiang LIU,Jianjuan LIU,Chi ZHANG. UAV detection algorithm based on spatial correlation enhancement[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 468-479.

[15]	Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU. Light-weight algorithm for real-time robotic grasp detection[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 599-610.

Viewed

Full text

Abstract

Cited

Shared

Discussed