|
|
Multi-scale context-guided feature elimination for ancient tower image classification |
Yuebo MENG1,2( ),Bo WANG1,2,Guanghui LIU1,2 |
1. College of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710300, China 2. Key Laboratory of Construction Robots for Higher Education in Shaanxi Province |
|
|
Abstract A multi-scale context-guided feature elimination classification method was proposed, for resolving the problems of ambiguous discriminative feature localization and complex scene interference in the classification task of ancient tower building images. First, a feature extraction network with MogaNet as the core was constructed, and multi-scale feature fusion was combined to fully explore the image information. Next, a context information extractor was designed to utilize the semantic context of the network to align and filter more discriminative local features, enhancing the ability to capture detailed features. Then, a feature elimination strategy was proposed to suppress fuzzy class features and background noise interference, and a loss function was designed to constrain fuzzy feature elimination and classification prediction. At last, a Chinese ancient tower architecture image dataset was established to provide data to support research on complex backgrounds and fuzzy boundaries in the field of fine-grained image categorization. This method achieved 96.3% accuracy on the self-constructed ancient tower architecture dataset, and 92.4%, 95.3% and 94.6% accuracy on three fine-grained datasets, namely, CUB-200-2011, Stanford Cars and FGVC-Aircraft, respectively. The proposed method outperforms other comparison algorithms and enables accurate classification of images of ancient tower buildings.
|
Received: 30 October 2023
Published: 25 November 2024
|
|
Fund: 国家自然科学基金资助项目(52278125);陕西省重点研发计划资助项目(2021SF-429). |
多尺度上下文引导特征消除的古塔图像分类
针对古塔建筑图像分类任务中难以准确定位判别性特征以及复杂场景干扰的问题,提出多尺度上下文引导特征消除的分类方法. 构建以MogaNet为核心的特征提取网络,结合多尺度的特征融合以充分挖掘图像信息;设计上下文信息提取器,利用网络的语义上下文来对齐和过滤更具判别性的局部特征,加强网络捕捉细节特征的能力;提出特征消除策略,抑制模糊类特征和背景噪声干扰,并设计损失函数来约束模糊类特征消除和分类预测;建立中国古塔建筑图像数据集,为细粒度图像分类领域内针对复杂背景和模糊边界的研究提供数据支撑. 实验结果表明,所提方法在自建的古塔建筑数据集上达到了96.3%的准确率,并在CUB-200-2011、Stanford Cars和FGVC-Aircraft这3个细粒度数据集上分别达到了92.4%、95.3%和94.6%的准确率,优于其他对比算法,可以实现古塔建筑图像的精确分类.
关键词:
图像分类,
上下文信息,
特征消除,
深度学习,
特征融合
|
|
[1] |
CAO B, ARAUJO A, SIM J. Unifying deep local and global features for image search [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 726−743.
|
|
|
[2] |
DOU Z, CUI H, ZHANG L, et al. Learning global and local consistent representations for unsupervised image retrieval via deep graph diffusion networks [EB/OL]. (2020-06-11)[2023-08-22]. https://arxiv.org/abs/2001.01284.
|
|
|
[3] |
ZHUANG P, WANG Y, QIAO Y. Learning attentive pairwise interaction for fine-grained classification [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2020: 13130−13137.
|
|
|
[4] |
BEHERA A, WHARTON Z, HEWAGE P R P G, et al. Context-aware attentional pooling for fine-grained visual classification [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2021: 929−937.
|
|
|
[5] |
HU T, QI H, HUANG Q, et al. See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification [EB/OL]. (2019-03-23)[2023-8-22]. https://arxiv.org/abs/1901.09891.
|
|
|
[6] |
王波, 黄冕, 刘利军, 等 基于多层聚焦Inception-V3卷积网络的细粒度图像分类[J]. 电子学报, 2022, 50 (1): 72- 78 WANG Bo, HUANG Mian, LIU Lijun, et al Multi-layer focused Inception-V3 models for fine-Grained visual recognition[J]. Acta Electronica Sinica, 2022, 50 (1): 72- 78
|
|
|
[7] |
DU R, CHANG D, BHUNIA A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 726−743.
|
|
|
[8] |
SONG J, YANG R. Feature boosting, suppression, and diversification for fine-grained visual classification [C]// 2021 International Joint Conference on Neural Networks . Shenzhen: IEEE, 2021: 1−8.
|
|
|
[9] |
RAO Y, CHEN G, LU J, et al. Counterfactual attention learning for fine-grained visual categorization and reidentification [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1025−1034.
|
|
|
[10] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [C]// Proceedings of the International Conference on Learning Representations . Washington DC: ICLR, 2021: 1−22.
|
|
|
[11] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . New York: Curran Associates, 2017: 6000−6010.
|
|
|
[12] |
WANG J, YU X, GAO Y. Feature fusion vision transformer for fine-grained visual categorization [C]// Proceedings of the British Machine Vision Conference . Durham: BMVA, 2021: 685−698.
|
|
|
[13] |
HE J, CHEN J N, LIU S, et al. Transfg: a transformer architecture for fine-grained recognition [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver: AAAI, 2022: 852−860.
|
|
|
[14] |
SUN H, HE X, PENG Y. Sim-trans: structure information modeling transformer for fine-grained visual categorization [C]// Proceedings of the 30th ACM International Conference on Multimedia . Ottawa: ACM, 2022: 5853−5861.
|
|
|
[15] |
LI S, WANG Z, LIU Z, et al. Efficient multi-order gated aggregation network [EB/OL]. (2023-03-20)[2023-8-22]. https://arxiv.org/abs/2211.03295.
|
|
|
[16] |
DO T, TRAN H, TJIPUTRA E, et al. Fine-grained visual classification using self assessment classifier [EB/OL]. (2022-05-21)[2023-8-22]. https://arxiv.org/abs/2205.10529.
|
|
|
[17] |
PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing . Doha: ACL, 2014: 1532−1543.
|
|
|
[18] |
KIM J H, JUN J, ZHANG B T. Bilinear attention networks [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . New York: Curran Associates, 2018: 1571−1581.
|
|
|
[19] |
BERA A, WHARTON Z, LIU Y, et al Sr-gnn: spatial relation-aware graph neural network for fine-grained image categorization[J]. IEEE Transactions on Image Processing, 2022, 31: 6017- 6031
doi: 10.1109/TIP.2022.3205215
|
|
|
[20] |
ZHU H, KE W, LI D, et al. Dual cross-attention learning for fine-grained visual categorization and object reidentification [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 4692−4702.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|