Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (6): 1130-1139    DOI: 10.3785/j.issn.1008-973X.2025.06.004
计算机技术     
基于自相似嵌入和全局特征重排序的图像检索方法
陈捷丰(),姚金良*()
杭州电子科技大学 计算机学院,浙江 杭州 310018
Image retrieval method based on self-similar embedding and global feature reranking
Jiefeng CHEN(),Jinliang YAO*()
College of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
 全文: PDF(1809 KB)   HTML
摘要:

现有的图像检索方法在特征提取阶段所提取的局部特征往往缺失结构信息,并且局部特征重排序方法会占用大量资源. 为此,提出基于自相似嵌入和全局特征重排序的图像检索方法. 提出自相似嵌入网络,以捕捉图像的内部结构,并将其压缩成密集的自相似特征图. 自相似特征图和初始图像特征图融合生成自相似嵌入特征图,可以同时表示图像的视觉和结构信息,从而达到更细粒度的检索效果. 参考查询扩展和数据库增强,提出全局特征重排序的方法. 根据初次排序的结果,提取每张图像对应的相似度排序靠前的图像的特征,采用进行线性求和的方法更新图像的初始特征,以突出具有相同内容的图像的共同特征,增大类间差距,以减少假阳例. 在实验中采用mAP作为评估指标对所提出自相似嵌入和重排序方法进行验证,结果表明,相较于现有方法,所提出方法在ROxford5K和RParis6K数据集上展现出更先进的性能.

关键词: 图像检索结构信息自相似性特征嵌入全局特征重排序    
Abstract:

Existing image retrieval methods often suffer from the loss of structural information during local feature extraction and the high computational cost associated with local feature re-ranking. To address these issues, an image retrieval method based on self-similarity embedding and global feature re-ranking was proposed. A self-similarity embedding network was introduced to capture the internal structure of images and compress it into a dense self-similarity feature map. A self-similarity embedded feature map was generated by fusing the self-similarity feature map with the initial image feature map. The generated map could represent both the visual and the structural information of the image, thereby achieving finer-grained retrieval results. A global feature re-ranking method was proposed by drawing inspiration from query expansion and database enhancement techniques. Based on the initial ranking results, the features of the top-ranked similar images for each image were extracted and the initial features were updated using a linear summation approach. This process highlighted the common features of images with the same content and increased the inter-class distance, thereby reducing the number of false positives. In experiments, the proposed self-similarity embedding and re-ranking methods were evaluated using the mean average precision (mAP) as the evaluation metric. The results demonstrated that the proposed method outperformed existing approaches on the ROxford5K and RParis6K datasets.

Key words: image retrieval    structural information    self-similarity    feature embedding    global feature reranking
收稿日期: 2024-07-03 出版日期: 2025-05-30
CLC:  TP 391  
通讯作者: 姚金良     E-mail: 1172560181@qq.com;yaojinl@hdu.edu.cn
作者简介: 陈捷丰(2000—),男,硕士生,从事计算机视觉研究. orcid.org/0009-0007-6864-3749. E-mail:1172560181@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
陈捷丰
姚金良

引用本文:

陈捷丰,姚金良. 基于自相似嵌入和全局特征重排序的图像检索方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1130-1139.

Jiefeng CHEN,Jinliang YAO. Image retrieval method based on self-similar embedding and global feature reranking. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1130-1139.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.06.004        https://www.zjujournals.com/eng/CN/Y2025/V59/I6/1130

图 1  自相似嵌入网络结构
图 2  CBAM模块结构
图 3  自相似张量编码流程
图 4  特征融合模块结构
图 5  全局特征重排序流程
类别方法mAP/%
Medium(ROxf)Medium(RPar)Hard(ROxf)Hard(RPar)
全局特征
R-MAC[11]75.1485.2853.7771.28
GeM-AP[12]67.5080.1042.8060.60
SOLAR[14]79.6588.6359.9975.26
DELG[33]76.4086.7455.9272.60
DOLG[16]80.5089.8158.8277.70
GLAM[34]78.6088.5060.2076.80
Swin-S-DALG[35]79.9490.0457.5579.06
SpCa[36]81.5588.6061.6976.21
SENet[37]81.9090.0063.0078.10
局部特征聚合+重排序
HesAff-rSIFT-ASMK+SP[38]60.6061.4036.7035.50
DELF-ASMK+SP[5]67.8076.9043.1055.40
DELF-R-ASMK+SP[39]76.0080.2052.4058.60
HOW-ASMK[21]79.4081.6056.9062.40
Fire[22]81.8085.3061.2070.00
全局特征+局部特征重排序GeM+DSM[40]65.3077.4039.2056.20
DELG+SP[33]81.2087.2064.0072.80
全局特征本研究方法(未重排序)77.2187.7960.8575.17
全局特征+全局特征再重排序本研究方法82.1190.3866.8580.24
表 1  RParis6K数据集和ROxford5K数据集上各个方法的评估结果
图 6  注意力分布热力图
图 7  本模型检索结果展示
方法M/GB
ROxf+R1MRPar+R1M
DELF-R-ASMK27.6
DELG485.9486.6
DELG(3 scales)437.1437.8
DELF(3 scales)434.2434.8
DELF(7 scales)477.9478.9
本研究方法3.83.8
表 2  各个重排序方法的内存占用对比
CBAM自相似计算特征融合+自相似张量编码mAP/%S/106
Medium(ROxf)Medium(RPar)Hard(ROxf)Hard(RPar)
77.2187.7960.8575.17165.11
76.1887.3658.4374.49164.96
76.8686.5957.2072.50164.96
74.2084.9051.6070.30119.94
表 3  模型中重要模块消融实验分析
$n_1$mAP/%
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
280.8489.7065.5478.82
381.1989.9765.3179.42
481.5590.1166.3379.75
581.7390.2466.6880.11
681.2790.3266.3480.31
780.5590.3664.2780.41
表 4  数据库端图片数量对重排序的影响
${\alpha}_1$mAP/%
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
180.5790.1163.8679.70
281.1590.1965.9879.92
381.7390.2466.6880.11
481.7190.2466.8180.15
581.6590.2166.7280.01
681.4990.1566.4779.97
781.4290.0666.2479.75
表 5  数据库端权重因子对重排序的影响
$n_2$mAP/%
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
181.4689.6965.4279.24
280.4889.7165.6179.33
381.6389.9966.3779.73
482.0290.2166.9980.07
581.7390.2466.6880.11
681.4490.2166.4780.14
781.0490.2966.3880.28
表 6  查询端图片数量对重排序的影响
${\alpha}_2$mAP/%
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
182.1790.3466.8580.21
282.1490.2866.9680.15
382.0290.2166.9980.07
482.2190.3866.8580.24
582.0890.3966.7580.24
681.3490.3964.4080.22
781.2290.3964.2680.20
表 7  查询端权重因子对重排序的影响
U (V)mAP/%S/106
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
577.0885.9658.6573.47155.96
777.2187.7960.8575.17165.11
976.9688.0258.1375.35173.96
表 8  邻域窗口尺寸对重排序的影响
1 ARANDJELOVIC R, GRONAT P, TORII A, et al. NetVLAD: CNN architecture for weakly supervised place recognition [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 5297–5307.
2 YANDEX A B, LEMPITSKY V. Aggregating local deep features for image retrieval [C]// IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1269–1277.
3 BABENKO A, SLESAREV A, CHIGORIN A, et al. Neural codes for image retrieval [C]// European Conference on Computer Vision. Zurich: Springer Nature Publishing, 2014: 584−599.
4 HE K, LU Y, SCLAROFF S. Local descriptors optimized for average precision [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 596–605.
5 NOH H, ARAUJO A, SIM J, et al. Large-scale image retrieval with attentive deep local features [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 3476–3485.
6 REVAUD J, DE SOUZA C, HUMENBERGER M, et al R2d2: reliable and repeatable detector and descriptor[J]. Advances in Neural Information Processing Systems, 2019, 32: 12405- 12415
7 SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 815–823.
8 HU J, LU J, TAN Y P. Discriminative deep metric learning for face verification in the wild [C]// IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1875–1882.
9 DENG J, GUO J, XUE N, et al. ArcFace: additive angular margin loss for deep face recognition [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4690–4699.
10 KALANTIDIS Y, MELLINA C, OSINDERO S. Cross-dimensional weighting for aggregated deep convolutional features [C]// European Conference on Computer Vision. Amsterdam: Springer International Publishing, 2016: 685–701.
11 TOLIAS G, SICRE R, JÉGOU H. Particular object retrieval with integral max-pooling of CNN activations [EB/OL]. (2016−02−24)[2023−10−13]. https://arxiv.org/abs/1511.05879.
12 RADENOVIĆ F, TOLIAS G, CHUM O Fine-tuning CNN image retrieval with No human annotation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41 (7): 1655- 1668
doi: 10.1109/TPAMI.2018.2846566
13 SHAO S, CHEN K, KARPUR A, et al. Global features are all you need for image retrieval and reranking [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 11002–11012.
14 NG T, BALNTAS V, TIAN Y, et al. SOLAR: second-order loss and attention for image retrieval [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 253−270.
15 WU H, WANG M, ZHOU W, et al. Learning token-based representation for image retrieval [C]// AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2022, 36(3): 2703−2711.
16 YANG M, HE D, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 11752–11761.
17 SONG C H, YOON J, CHOI S, et al. Boosting vision transformers for image retrieval [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 107–117.
18 LOWE D G Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2): 91- 110
doi: 10.1023/B:VISI.0000029664.99615.94
19 BAY H, ESS A, TUYTELAARS T, et al Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110 (3): 346- 359
doi: 10.1016/j.cviu.2007.09.014
20 DUSMANU M, ROCCO I, PAJDLA T, et al. D2-net: a trainable cnn for joint detection and description of local features [EB/OL]. (2019−05−09)[2023−11−28]. https://arxiv.org/abs/1905.03561.
21 TOLIAS G, JENICEK T, CHUM O. Learning and aggregating deep local descriptors for instance-level recognition [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 460−477.
22 WEINZAEPFEL P, LUCAS T, LARLUS D, et al. Learning super-features for image retrieval [EB/OL]. (2022−01−31)[2023−12−17]. https://arxiv.org/abs/2201.13182.
23 TAN F, YUAN J, ORDONEZ V. Instance-level image retrieval using reranking transformers [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 12085–12095.
24 LEE S, SEONG H, LEE S, et al. Correlation verification for image retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5364–5374.
25 KANG D, KWON H, MIN J, et al. Relational embedding for few-shot classification [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 8802–8813.
26 WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module [C]// European Conference on Computer Vision. Munich: Springer International Publishing, 2018: 3−19.
27 GORDO A, RADENOVIC F, BERG T. Attention-based query expansion learning [C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 172−188.
28 ARANDJELOVIĆ R, ZISSERMAN A. Three things everyone should know to improve object retrieval [C]// IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2911–2918.
29 GORDO A, ALMAZÁN J, REVAUD J, et al End-to-end learning of deep visual representations for image retrieval[J]. International Journal of Computer Vision, 2017, 124 (2): 237- 254
doi: 10.1007/s11263-017-1016-8
30 WEYAND T, ARAUJO A, CAO B, et al. Google landmarks dataset v2–A large-scale benchmark for instance-level recognition and retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 2575–2584.
31 RADENOVIC F, ISCEN A, TOLIAS G, et al. Revisiting Oxford and paris: large-scale image retrieval benchmarking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5706–5715.
32 PHILBIN J, CHUM O, ISARD M, et al. Object retrieval with large vocabularies and fast spatial matching [C]// IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis: IEEE, 2007: 1–8.
33 CAO B, ARAUJO A, SIM J. Unifying deep local and global features for image search [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 726−743.
34 SONG C H, HAN H J, AVRITHIS Y. All the attention you need: global-local, spatial-channel attention for image retrieval [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022: 439–448.
35 SONG Y, ZHU R, YANG M, et al. Dalg: Deep attentive local and global modeling for image retrieval [EB/OL]. (2022−07−01)[2024−03−11]. https://arxiv.org/abs/2207.00287.
36 ZHANG Z, WANG L, ZHOU L, et al. Learning spatial-context-aware global visual feature representation for instance image retrieval [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 11216–11225.
37 LEE S, LEE S, SEONG H, et al. Revisiting self-similarity: structural embedding for image retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 23412–23421.
38 TOLIAS G, AVRITHIS Y, JÉGOU H Image search with selective match kernels: aggregation across single and multiple images[J]. International Journal of Computer Vision, 2016, 116 (3): 247- 261
doi: 10.1007/s11263-015-0810-4
39 TEICHMANN M, ARAUJO A, ZHU M, et al. Detect-to-retrieve: efficient regional aggregation for image search [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5109–5118.
[1] 杨帆,宁博,李怀清,周新,李冠宇. 基于语义增强特征融合的多模态图像检索模型[J]. 浙江大学学报(工学版), 2023, 57(2): 252-258.
[2] 凤丽洲,杨阳,王友卫,杨贵军. 基于Transformer和知识图谱的新闻推荐新方法[J]. 浙江大学学报(工学版), 2023, 57(1): 133-143.
[3] 张师林,马思明,顾子谦. 基于大边距度量学习的车辆再识别方法[J]. 浙江大学学报(工学版), 2021, 55(5): 948-956.
[4] 王金德, 寿黎但, 李晓燕, 陈刚. 基于多重分割捆绑特征的目标图像检索[J]. J4, 2011, 45(2): 259-266.
[5] 黄鹏, 陈纯, 王灿, 等. 使用加权图像标注改进Web图像检索[J]. J4, 2009, 43(12): 2129-2135.
[6] 谭征宇 孙守迁. 基于意象认知模型的图像检索技术[J]. J4, 2008, 42(5): 763-767.