Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (6): 1130-1139    DOI: 10.3785/j.issn.1008-973X.2025.06.004
    
Image retrieval method based on self-similar embedding and global feature reranking
Jiefeng CHEN(),Jinliang YAO*()
College of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
Download: HTML     PDF(1809KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Existing image retrieval methods often suffer from the loss of structural information during local feature extraction and the high computational cost associated with local feature re-ranking. To address these issues, an image retrieval method based on self-similarity embedding and global feature re-ranking was proposed. A self-similarity embedding network was introduced to capture the internal structure of images and compress it into a dense self-similarity feature map. A self-similarity embedded feature map was generated by fusing the self-similarity feature map with the initial image feature map. The generated map could represent both the visual and the structural information of the image, thereby achieving finer-grained retrieval results. A global feature re-ranking method was proposed by drawing inspiration from query expansion and database enhancement techniques. Based on the initial ranking results, the features of the top-ranked similar images for each image were extracted and the initial features were updated using a linear summation approach. This process highlighted the common features of images with the same content and increased the inter-class distance, thereby reducing the number of false positives. In experiments, the proposed self-similarity embedding and re-ranking methods were evaluated using the mean average precision (mAP) as the evaluation metric. The results demonstrated that the proposed method outperformed existing approaches on the ROxford5K and RParis6K datasets.



Key wordsimage retrieval      structural information      self-similarity      feature embedding      global feature reranking     
Received: 03 July 2024      Published: 30 May 2025
CLC:  TP 391  
Corresponding Authors: Jinliang YAO     E-mail: 1172560181@qq.com;yaojinl@hdu.edu.cn
Cite this article:

Jiefeng CHEN,Jinliang YAO. Image retrieval method based on self-similar embedding and global feature reranking. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1130-1139.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.06.004     OR     https://www.zjujournals.com/eng/Y2025/V59/I6/1130


基于自相似嵌入和全局特征重排序的图像检索方法

现有的图像检索方法在特征提取阶段所提取的局部特征往往缺失结构信息,并且局部特征重排序方法会占用大量资源. 为此,提出基于自相似嵌入和全局特征重排序的图像检索方法. 提出自相似嵌入网络,以捕捉图像的内部结构,并将其压缩成密集的自相似特征图. 自相似特征图和初始图像特征图融合生成自相似嵌入特征图,可以同时表示图像的视觉和结构信息,从而达到更细粒度的检索效果. 参考查询扩展和数据库增强,提出全局特征重排序的方法. 根据初次排序的结果,提取每张图像对应的相似度排序靠前的图像的特征,采用进行线性求和的方法更新图像的初始特征,以突出具有相同内容的图像的共同特征,增大类间差距,以减少假阳例. 在实验中采用mAP作为评估指标对所提出自相似嵌入和重排序方法进行验证,结果表明,相较于现有方法,所提出方法在ROxford5K和RParis6K数据集上展现出更先进的性能.


关键词: 图像检索,  结构信息,  自相似性,  特征嵌入,  全局特征重排序 
Fig.1 Self-similar embedded network structure
Fig.2 CBAM module structure
Fig.3 Process of self-similar tensor coding
Fig.4 Feature fusion module structure
Fig.5 Process of global feature reranking
类别方法mAP/%
Medium(ROxf)Medium(RPar)Hard(ROxf)Hard(RPar)
全局特征
R-MAC[11]75.1485.2853.7771.28
GeM-AP[12]67.5080.1042.8060.60
SOLAR[14]79.6588.6359.9975.26
DELG[33]76.4086.7455.9272.60
DOLG[16]80.5089.8158.8277.70
GLAM[34]78.6088.5060.2076.80
Swin-S-DALG[35]79.9490.0457.5579.06
SpCa[36]81.5588.6061.6976.21
SENet[37]81.9090.0063.0078.10
局部特征聚合+重排序
HesAff-rSIFT-ASMK+SP[38]60.6061.4036.7035.50
DELF-ASMK+SP[5]67.8076.9043.1055.40
DELF-R-ASMK+SP[39]76.0080.2052.4058.60
HOW-ASMK[21]79.4081.6056.9062.40
Fire[22]81.8085.3061.2070.00
全局特征+局部特征重排序GeM+DSM[40]65.3077.4039.2056.20
DELG+SP[33]81.2087.2064.0072.80
全局特征本研究方法(未重排序)77.2187.7960.8575.17
全局特征+全局特征再重排序本研究方法82.1190.3866.8580.24
Tab.1 Evaluation results of various methods on RParis6K and ROxford5K datasets
Fig.6 Heat map of attention distribution
Fig.7 Retrieval results display of proposed model
方法M/GB
ROxf+R1MRPar+R1M
DELF-R-ASMK27.6
DELG485.9486.6
DELG(3 scales)437.1437.8
DELF(3 scales)434.2434.8
DELF(7 scales)477.9478.9
本研究方法3.83.8
Tab.2 Comparison of memory footprint of each reoranking method
CBAM自相似计算特征融合+自相似张量编码mAP/%S/106
Medium(ROxf)Medium(RPar)Hard(ROxf)Hard(RPar)
77.2187.7960.8575.17165.11
76.1887.3658.4374.49164.96
76.8686.5957.2072.50164.96
74.2084.9051.6070.30119.94
Tab.3 Ablation study analysis of key modules in proposed model
$n_1$mAP/%
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
280.8489.7065.5478.82
381.1989.9765.3179.42
481.5590.1166.3379.75
581.7390.2466.6880.11
681.2790.3266.3480.31
780.5590.3664.2780.41
Tab.4 Effect of number of images on reordering on database side
${\alpha}_1$mAP/%
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
180.5790.1163.8679.70
281.1590.1965.9879.92
381.7390.2466.6880.11
481.7190.2466.8180.15
581.6590.2166.7280.01
681.4990.1566.4779.97
781.4290.0666.2479.75
Tab.5 Impact of database-side weighting factors on reordering
$n_2$mAP/%
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
181.4689.6965.4279.24
280.4889.7165.6179.33
381.6389.9966.3779.73
482.0290.2166.9980.07
581.7390.2466.6880.11
681.4490.2166.4780.14
781.0490.2966.3880.28
Tab.6 Effect of query-side image numbers on reordering
${\alpha}_2$mAP/%
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
182.1790.3466.8580.21
282.1490.2866.9680.15
382.0290.2166.9980.07
482.2190.3866.8580.24
582.0890.3966.7580.24
681.3490.3964.4080.22
781.2290.3964.2680.20
Tab.7 Impact of query-side weighting factors on reordering
U (V)mAP/%S/106
Medium
(ROxf)
Medium
(RPar)
Hard
(ROxf)
Hard
(RPar)
577.0885.9658.6573.47155.96
777.2187.7960.8575.17165.11
976.9688.0258.1375.35173.96
Tab.8 Effect of neighborhood window size on reordering
[1]   ARANDJELOVIC R, GRONAT P, TORII A, et al. NetVLAD: CNN architecture for weakly supervised place recognition [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 5297–5307.
[2]   YANDEX A B, LEMPITSKY V. Aggregating local deep features for image retrieval [C]// IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1269–1277.
[3]   BABENKO A, SLESAREV A, CHIGORIN A, et al. Neural codes for image retrieval [C]// European Conference on Computer Vision. Zurich: Springer Nature Publishing, 2014: 584−599.
[4]   HE K, LU Y, SCLAROFF S. Local descriptors optimized for average precision [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 596–605.
[5]   NOH H, ARAUJO A, SIM J, et al. Large-scale image retrieval with attentive deep local features [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 3476–3485.
[6]   REVAUD J, DE SOUZA C, HUMENBERGER M, et al R2d2: reliable and repeatable detector and descriptor[J]. Advances in Neural Information Processing Systems, 2019, 32: 12405- 12415
[7]   SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 815–823.
[8]   HU J, LU J, TAN Y P. Discriminative deep metric learning for face verification in the wild [C]// IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1875–1882.
[9]   DENG J, GUO J, XUE N, et al. ArcFace: additive angular margin loss for deep face recognition [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4690–4699.
[10]   KALANTIDIS Y, MELLINA C, OSINDERO S. Cross-dimensional weighting for aggregated deep convolutional features [C]// European Conference on Computer Vision. Amsterdam: Springer International Publishing, 2016: 685–701.
[11]   TOLIAS G, SICRE R, JÉGOU H. Particular object retrieval with integral max-pooling of CNN activations [EB/OL]. (2016−02−24)[2023−10−13]. https://arxiv.org/abs/1511.05879.
[12]   RADENOVIĆ F, TOLIAS G, CHUM O Fine-tuning CNN image retrieval with No human annotation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41 (7): 1655- 1668
doi: 10.1109/TPAMI.2018.2846566
[13]   SHAO S, CHEN K, KARPUR A, et al. Global features are all you need for image retrieval and reranking [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 11002–11012.
[14]   NG T, BALNTAS V, TIAN Y, et al. SOLAR: second-order loss and attention for image retrieval [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 253−270.
[15]   WU H, WANG M, ZHOU W, et al. Learning token-based representation for image retrieval [C]// AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2022, 36(3): 2703−2711.
[16]   YANG M, HE D, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 11752–11761.
[17]   SONG C H, YOON J, CHOI S, et al. Boosting vision transformers for image retrieval [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 107–117.
[18]   LOWE D G Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2): 91- 110
doi: 10.1023/B:VISI.0000029664.99615.94
[19]   BAY H, ESS A, TUYTELAARS T, et al Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110 (3): 346- 359
doi: 10.1016/j.cviu.2007.09.014
[20]   DUSMANU M, ROCCO I, PAJDLA T, et al. D2-net: a trainable cnn for joint detection and description of local features [EB/OL]. (2019−05−09)[2023−11−28]. https://arxiv.org/abs/1905.03561.
[21]   TOLIAS G, JENICEK T, CHUM O. Learning and aggregating deep local descriptors for instance-level recognition [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 460−477.
[22]   WEINZAEPFEL P, LUCAS T, LARLUS D, et al. Learning super-features for image retrieval [EB/OL]. (2022−01−31)[2023−12−17]. https://arxiv.org/abs/2201.13182.
[23]   TAN F, YUAN J, ORDONEZ V. Instance-level image retrieval using reranking transformers [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 12085–12095.
[24]   LEE S, SEONG H, LEE S, et al. Correlation verification for image retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5364–5374.
[25]   KANG D, KWON H, MIN J, et al. Relational embedding for few-shot classification [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 8802–8813.
[26]   WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module [C]// European Conference on Computer Vision. Munich: Springer International Publishing, 2018: 3−19.
[27]   GORDO A, RADENOVIC F, BERG T. Attention-based query expansion learning [C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 172−188.
[28]   ARANDJELOVIĆ R, ZISSERMAN A. Three things everyone should know to improve object retrieval [C]// IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2911–2918.
[29]   GORDO A, ALMAZÁN J, REVAUD J, et al End-to-end learning of deep visual representations for image retrieval[J]. International Journal of Computer Vision, 2017, 124 (2): 237- 254
doi: 10.1007/s11263-017-1016-8
[30]   WEYAND T, ARAUJO A, CAO B, et al. Google landmarks dataset v2–A large-scale benchmark for instance-level recognition and retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 2575–2584.
[31]   RADENOVIC F, ISCEN A, TOLIAS G, et al. Revisiting Oxford and paris: large-scale image retrieval benchmarking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5706–5715.
[32]   PHILBIN J, CHUM O, ISARD M, et al. Object retrieval with large vocabularies and fast spatial matching [C]// IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis: IEEE, 2007: 1–8.
[33]   CAO B, ARAUJO A, SIM J. Unifying deep local and global features for image search [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 726−743.
[34]   SONG C H, HAN H J, AVRITHIS Y. All the attention you need: global-local, spatial-channel attention for image retrieval [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022: 439–448.
[35]   SONG Y, ZHU R, YANG M, et al. Dalg: Deep attentive local and global modeling for image retrieval [EB/OL]. (2022−07−01)[2024−03−11]. https://arxiv.org/abs/2207.00287.
[36]   ZHANG Z, WANG L, ZHOU L, et al. Learning spatial-context-aware global visual feature representation for instance image retrieval [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 11216–11225.
[37]   LEE S, LEE S, SEONG H, et al. Revisiting self-similarity: structural embedding for image retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 23412–23421.
[38]   TOLIAS G, AVRITHIS Y, JÉGOU H Image search with selective match kernels: aggregation across single and multiple images[J]. International Journal of Computer Vision, 2016, 116 (3): 247- 261
doi: 10.1007/s11263-015-0810-4
[39]   TEICHMANN M, ARAUJO A, ZHU M, et al. Detect-to-retrieve: efficient regional aggregation for image search [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5109–5118.
[1] Fan YANG,Bo NING,Huai-qing LI,Xin ZHOU,Guan-yu LI. Multimodal image retrieval model based on semantic-enhanced feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 252-258.
[2] Li-zhou FENG,Yang YANG,You-wei WANG,Gui-jun YANG. New method for news recommendation based on Transformer and knowledge graph[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 133-143.
[3] Shi-lin ZHANG,Si-ming MA,Zi-qian GU. Large margin metric learning based vehicle re-identification method[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(5): 948-956.
[4] WANG Jin-de, SHOU Li-dan, LI Xiao-yan, CHEN Gang. Bundling features with multiple segmentations for
object-based image retrieval
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(2): 259-266.
[5] HUANG Feng, CHEN Chun, WANG Can, et al. Improved Web image retrieval by weighted image annotations[J]. Journal of ZheJiang University (Engineering Science), 2009, 43(12): 2129-2135.