Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (12): 2489-2499    DOI: 10.3785/j.issn.1008-973X.2024.12.008
计算机技术     
多尺度上下文引导特征消除的古塔图像分类
孟月波1,2(),王博1,2,刘光辉1,2
1. 西安建筑科技大学 信息与控制工程学院,陕西 西安 710300
2. 建筑机器人陕西省高等学校重点实验室
Multi-scale context-guided feature elimination for ancient tower image classification
Yuebo MENG1,2(),Bo WANG1,2,Guanghui LIU1,2
1. College of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710300, China
2. Key Laboratory of Construction Robots for Higher Education in Shaanxi Province
 全文: PDF(1803 KB)   HTML
摘要:

针对古塔建筑图像分类任务中难以准确定位判别性特征以及复杂场景干扰的问题,提出多尺度上下文引导特征消除的分类方法. 构建以MogaNet为核心的特征提取网络,结合多尺度的特征融合以充分挖掘图像信息;设计上下文信息提取器,利用网络的语义上下文来对齐和过滤更具判别性的局部特征,加强网络捕捉细节特征的能力;提出特征消除策略,抑制模糊类特征和背景噪声干扰,并设计损失函数来约束模糊类特征消除和分类预测;建立中国古塔建筑图像数据集,为细粒度图像分类领域内针对复杂背景和模糊边界的研究提供数据支撑. 实验结果表明,所提方法在自建的古塔建筑数据集上达到了96.3%的准确率,并在CUB-200-2011、Stanford Cars和FGVC-Aircraft这3个细粒度数据集上分别达到了92.4%、95.3%和94.6%的准确率,优于其他对比算法,可以实现古塔建筑图像的精确分类.

关键词: 图像分类上下文信息特征消除深度学习特征融合    
Abstract:

A multi-scale context-guided feature elimination classification method was proposed, for resolving the problems of ambiguous discriminative feature localization and complex scene interference in the classification task of ancient tower building images. First, a feature extraction network with MogaNet as the core was constructed, and multi-scale feature fusion was combined to fully explore the image information. Next, a context information extractor was designed to utilize the semantic context of the network to align and filter more discriminative local features, enhancing the ability to capture detailed features. Then, a feature elimination strategy was proposed to suppress fuzzy class features and background noise interference, and a loss function was designed to constrain fuzzy feature elimination and classification prediction. At last, a Chinese ancient tower architecture image dataset was established to provide data to support research on complex backgrounds and fuzzy boundaries in the field of fine-grained image categorization. This method achieved 96.3% accuracy on the self-constructed ancient tower architecture dataset, and 92.4%, 95.3% and 94.6% accuracy on three fine-grained datasets, namely, CUB-200-2011, Stanford Cars and FGVC-Aircraft, respectively. The proposed method outperforms other comparison algorithms and enables accurate classification of images of ancient tower buildings.

Key words: image classification    contextual information    feature elimination    deep learning    feature fusion
收稿日期: 2023-10-30 出版日期: 2024-11-25
CLC:  TP 399  
基金资助: 国家自然科学基金资助项目(52278125);陕西省重点研发计划资助项目(2021SF-429).
作者简介: 孟月波(1979—),女,教授,博士,从事计算机视觉理解研究. orcid.org/0000-0002-5231-3071. E-mail:mengyuebo@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
孟月波
王博
刘光辉

引用本文:

孟月波,王博,刘光辉. 多尺度上下文引导特征消除的古塔图像分类[J]. 浙江大学学报(工学版), 2024, 58(12): 2489-2499.

Yuebo MENG,Bo WANG,Guanghui LIU. Multi-scale context-guided feature elimination for ancient tower image classification. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2489-2499.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.12.008        https://www.zjujournals.com/eng/CN/Y2024/V58/I12/2489

图 1  多尺度上下文特征消除网络结构图
图 2  上下文信息提取器单层结构图
图 3  特征消除策略单层结构图
图 4  古塔建筑数据集图像示例
图 5  古塔建筑数据集分布
数据集$ {{T}_{{\text{train}}}} $$ {{T}_{{\text{test}}}} $$ {{T}_{{\text{lable}}}} $
古塔数据集7 6427 511263
CUB-200-20115 9945 794200
Stanford Cars1448 041196
FGVC-Aircraft6 6673 333100
表 1  4种数据集参数
$\xi $$\tau $P/%
[1600, 400, 100, 25][128, 32, 8, 2]95.61
[256, 64, 16, 4]95.62
[512, 128, 32, 8]95.42
[2048, 512, 128, 32][128, 32, 8, 2]96.04
[256, 64, 16, 4]96.10
[512, 128, 32, 8]96.22
[2304, 576, 144, 36][128, 32, 8, 2]96.28
[256, 64, 16, 4]96.32
[512, 128, 32, 8]96.30
[3136, 784, 196, 49][128, 32, 8, 2]95.85
[256, 64, 16, 4]95.96
[512, 128, 32, 8]95.11
表 2  不同特征编号集合数量下古塔数据集上的实验结果
损失函数P/%
${{L} _{{\text{final}}}}$95.25
${{L} _{{\text{final}}}}$+${{L} _{{\text{back}}}}$95.69
${{L} _{{\text{final}}}}$+${{L} _{{\text{joint}}}}$95.38
${{L} _{{\text{final}}}}$+${{L} _{{\text{drop}}}}$95.85
${{L} _{{\text{final}}}}$+${{L} _{{\text{back}}}}$+${{L} _{{\text{joint}}}}$96.12
${{L} _{{\text{final}}}}$+${{L} _{{\text{back}}}}$+${{L} _{{\text{joint}}}}$+${{L} _{{\text{drop}}}}$96.32
表 3  不同损失函数下古塔数据集上的实验结果
方法P/%Para/MCal/G
MogaNet-L93.9082.515.9
MogaNet-L+CIE95.35(+1.45)82.5(+0)17.5(+1.6)
MogaNet-L+FES94.93(+1.03)100.8(+18.3)30.3(+14.4)
MogaNet-L+CIE+FES96.32(+2.42)100.8(+18.3)31.9(+16.0)
表 4  本研究方法在古塔数据集上消融实验结果
方法主干网络分辨率P/%
PMG[7]Resnet50448×44894.2
FBSD[8]Densenet161448×44894.5
TransFG[13]ViT-B_16448×44894.5
FFVT[12]ViT-B_16448×44894.8
SIM-Trans[14]ViT-B_16448×44894.8
CAP[4]Xception224×22494.9
ViT-SAC[16]ViT-B_16448×44895.4
SR-GNN[19]Xception224×22495.6
DCAL[20]R50-ViT-Base448×44895.7
本研究算法MogaNet-L224×22496.3
表 5  不同方法在古塔数据集上的准确率对比
方法分辨率Para/MCal/GP/%
TransFG[13]448×44886.262.095.4
Vit-SAC[15]448×448106.092.595.6
DCAL[20]384×38488.047.095.7
本研究算法224×224100.831.996.3
表 6  不同方法在参数量、计算量等方面的对比
图 6  可视化对比图
方法主干网络分辨率P/%
CUB-200-2011Stanford CarsAircraft
WS-DAN[5]Inception v3448×44889.494.593.0
PMG[7]ResNet-50550×55089.695.193.4
API-Net[3]DenseNet-161512×51290.095.393.9
PART[21]ResNet-101448×44890.195.394.6
CAL[9]ResNet-101448×44890.695.594.2
FFVT[12]ViT-B_16448×44891.694.194.3
TransFG[13]ViT-B_16448×44891.794.894.1
CAP[4]Xception224×22491.895.794.5
ViT-SAC[16]ViT-B_16448×44891.895.093.1
DCAL[20]R50-ViT-Base448×44892.095.393.3
本研究方法MogaNet-L224×22492.495.394.6
表 7  不同算法在细粒度数据集上的准确率对比
1 CAO B, ARAUJO A, SIM J. Unifying deep local and global features for image search [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 726−743.
2 DOU Z, CUI H, ZHANG L, et al. Learning global and local consistent representations for unsupervised image retrieval via deep graph diffusion networks [EB/OL]. (2020-06-11)[2023-08-22]. https://arxiv.org/abs/2001.01284.
3 ZHUANG P, WANG Y, QIAO Y. Learning attentive pairwise interaction for fine-grained classification [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2020: 13130−13137.
4 BEHERA A, WHARTON Z, HEWAGE P R P G, et al. Context-aware attentional pooling for fine-grained visual classification [C]// Proceedings of the AAAI Conference on Artificial Intelligence . New York: AAAI, 2021: 929−937.
5 HU T, QI H, HUANG Q, et al. See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification [EB/OL]. (2019-03-23)[2023-8-22]. https://arxiv.org/abs/1901.09891.
6 王波, 黄冕, 刘利军, 等 基于多层聚焦Inception-V3卷积网络的细粒度图像分类[J]. 电子学报, 2022, 50 (1): 72- 78
WANG Bo, HUANG Mian, LIU Lijun, et al Multi-layer focused Inception-V3 models for fine-Grained visual recognition[J]. Acta Electronica Sinica, 2022, 50 (1): 72- 78
7 DU R, CHANG D, BHUNIA A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 726−743.
8 SONG J, YANG R. Feature boosting, suppression, and diversification for fine-grained visual classification [C]// 2021 International Joint Conference on Neural Networks . Shenzhen: IEEE, 2021: 1−8.
9 RAO Y, CHEN G, LU J, et al. Counterfactual attention learning for fine-grained visual categorization and reidentification [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1025−1034.
10 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [C]// Proceedings of the International Conference on Learning Representations . Washington DC: ICLR, 2021: 1−22.
11 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . New York: Curran Associates, 2017: 6000−6010.
12 WANG J, YU X, GAO Y. Feature fusion vision transformer for fine-grained visual categorization [C]// Proceedings of the British Machine Vision Conference . Durham: BMVA, 2021: 685−698.
13 HE J, CHEN J N, LIU S, et al. Transfg: a transformer architecture for fine-grained recognition [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver: AAAI, 2022: 852−860.
14 SUN H, HE X, PENG Y. Sim-trans: structure information modeling transformer for fine-grained visual categorization [C]// Proceedings of the 30th ACM International Conference on Multimedia . Ottawa: ACM, 2022: 5853−5861.
15 LI S, WANG Z, LIU Z, et al. Efficient multi-order gated aggregation network [EB/OL]. (2023-03-20)[2023-8-22]. https://arxiv.org/abs/2211.03295.
16 DO T, TRAN H, TJIPUTRA E, et al. Fine-grained visual classification using self assessment classifier [EB/OL]. (2022-05-21)[2023-8-22]. https://arxiv.org/abs/2205.10529.
17 PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing . Doha: ACL, 2014: 1532−1543.
18 KIM J H, JUN J, ZHANG B T. Bilinear attention networks [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . New York: Curran Associates, 2018: 1571−1581.
19 BERA A, WHARTON Z, LIU Y, et al Sr-gnn: spatial relation-aware graph neural network for fine-grained image categorization[J]. IEEE Transactions on Image Processing, 2022, 31: 6017- 6031
doi: 10.1109/TIP.2022.3205215
20 ZHU H, KE W, LI D, et al. Dual cross-attention learning for fine-grained visual categorization and object reidentification [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 4692−4702.
[1] 刘欢,李云红,张蕾涛,郭越,苏雪平,朱耀麟,侯乐乐. 基于MA-ConvNext网络和分步关系知识蒸馏的苹果叶片病害识别[J]. 浙江大学学报(工学版), 2024, 58(9): 1757-1767.
[2] 李凡,杨杰,冯志成,陈智超,付云骁. 基于图像识别的弓网接触点检测方法[J]. 浙江大学学报(工学版), 2024, 58(9): 1801-1810.
[3] 肖力,曹志刚,卢浩冉,黄志坚,蔡袁强. 基于深度学习和梯度优化的弹性超材料设计[J]. 浙江大学学报(工学版), 2024, 58(9): 1892-1901.
[4] 吴书晗,王丹,陈远方,贾子钰,张越棋,许萌. 融合注意力的滤波器组双视图图卷积运动想象脑电分类[J]. 浙江大学学报(工学版), 2024, 58(7): 1326-1335.
[5] 李林睿,王东升,范红杰. 基于法条知识的事理型类案检索方法[J]. 浙江大学学报(工学版), 2024, 58(7): 1357-1365.
[6] 马现伟,范朝辉,聂为之,李东,朱逸群. 对失效传感器具备鲁棒性的故障诊断方法[J]. 浙江大学学报(工学版), 2024, 58(7): 1488-1497.
[7] 宋娟,贺龙喜,龙会平. 基于深度学习的隧道衬砌多病害检测算法[J]. 浙江大学学报(工学版), 2024, 58(6): 1161-1173.
[8] 钟博,王鹏飞,王乙乔,王晓玲. 基于深度学习的EEG数据分析技术综述[J]. 浙江大学学报(工学版), 2024, 58(5): 879-890.
[9] 刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[10] 魏翠婷,赵唯坚,孙博超,刘芸怡. 基于改进Mask R-CNN与双目视觉的智能配筋检测[J]. 浙江大学学报(工学版), 2024, 58(5): 1009-1019.
[11] 曹寅,秦俊平,高彤,马千里,任家琪. 基于生成对抗网络的文本两阶段生成高质量图像方法[J]. 浙江大学学报(工学版), 2024, 58(4): 674-683.
[12] 宦海,盛宇,顾晨曦. 基于遥感图像道路提取的全局指导多特征融合网络[J]. 浙江大学学报(工学版), 2024, 58(4): 696-707.
[13] 罗向龙,王亚飞,王彦博,王立新. 基于双向门控式宽度学习系统的监测数据结构变形预测[J]. 浙江大学学报(工学版), 2024, 58(4): 729-736.
[14] 张会娟,李坤鹏,姬淼鑫,刘振江,刘建娟,张弛. 基于空间相关性增强的无人机检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 468-479.
[15] 宋明俊,严文,邓益昭,张俊然,涂海燕. 轻量化机器人抓取位姿实时检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 599-610.