基于自相似嵌入和全局特征重排序的图像检索方法

doi:10.3785/j.issn.1008-973X.2025.06.004

浙江大学学报(工学版)

2025, Vol. 59

Issue (6): 1130-1139 DOI: 10.3785/j.issn.1008-973X.2025.06.004

计算机技术

基于自相似嵌入和全局特征重排序的图像检索方法

陈捷丰(

),姚金良*(

)

杭州电子科技大学计算机学院，浙江杭州 310018

Image retrieval method based on self-similar embedding and global feature reranking

Jiefeng CHEN(

),Jinliang YAO*(

)

College of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

全文: PDF(1809 KB) HTML

摘要：

现有的图像检索方法在特征提取阶段所提取的局部特征往往缺失结构信息，并且局部特征重排序方法会占用大量资源. 为此，提出基于自相似嵌入和全局特征重排序的图像检索方法. 提出自相似嵌入网络，以捕捉图像的内部结构，并将其压缩成密集的自相似特征图. 自相似特征图和初始图像特征图融合生成自相似嵌入特征图，可以同时表示图像的视觉和结构信息，从而达到更细粒度的检索效果. 参考查询扩展和数据库增强，提出全局特征重排序的方法. 根据初次排序的结果，提取每张图像对应的相似度排序靠前的图像的特征，采用进行线性求和的方法更新图像的初始特征，以突出具有相同内容的图像的共同特征，增大类间差距，以减少假阳例. 在实验中采用mAP作为评估指标对所提出自相似嵌入和重排序方法进行验证，结果表明，相较于现有方法，所提出方法在ROxford5K和RParis6K数据集上展现出更先进的性能.

关键词： 图像检索; 结构信息; 自相似性; 特征嵌入; 全局特征重排序

Abstract:

Existing image retrieval methods often suffer from the loss of structural information during local feature extraction and the high computational cost associated with local feature re-ranking. To address these issues, an image retrieval method based on self-similarity embedding and global feature re-ranking was proposed. A self-similarity embedding network was introduced to capture the internal structure of images and compress it into a dense self-similarity feature map. A self-similarity embedded feature map was generated by fusing the self-similarity feature map with the initial image feature map. The generated map could represent both the visual and the structural information of the image, thereby achieving finer-grained retrieval results. A global feature re-ranking method was proposed by drawing inspiration from query expansion and database enhancement techniques. Based on the initial ranking results, the features of the top-ranked similar images for each image were extracted and the initial features were updated using a linear summation approach. This process highlighted the common features of images with the same content and increased the inter-class distance, thereby reducing the number of false positives. In experiments, the proposed self-similarity embedding and re-ranking methods were evaluated using the mean average precision (mAP) as the evaluation metric. The results demonstrated that the proposed method outperformed existing approaches on the ROxford5K and RParis6K datasets.

Key words: image retrieval structural information self-similarity feature embedding global feature reranking

收稿日期: 2024-07-03 出版日期: 2025-05-30

CLC:

TP 391

通讯作者: 姚金良 E-mail: 1172560181@qq.com;yaojinl@hdu.edu.cn

作者简介: 陈捷丰（2000—），男，硕士生，从事计算机视觉研究. orcid.org/0009-0007-6864-3749. E-mail：1172560181@qq.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	陈捷丰
	姚金良

引用本文:

陈捷丰,姚金良. 基于自相似嵌入和全局特征重排序的图像检索方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1130-1139.

Jiefeng CHEN,Jinliang YAO. Image retrieval method based on self-similar embedding and global feature reranking. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1130-1139.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.06.004 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I6/1130

图 1 自相似嵌入网络结构

图 2 CBAM模块结构

图 3 自相似张量编码流程

图 4 特征融合模块结构

图 5 全局特征重排序流程

表 1 RParis6K数据集和ROxford5K数据集上各个方法的评估结果

图 6 注意力分布热力图

图 7 本模型检索结果展示

表 2 各个重排序方法的内存占用对比

表 3 模型中重要模块消融实验分析

表 4 数据库端图片数量对重排序的影响

表 5 数据库端权重因子对重排序的影响

表 6 查询端图片数量对重排序的影响

表 7 查询端权重因子对重排序的影响

表 8 邻域窗口尺寸对重排序的影响

1	ARANDJELOVIC R, GRONAT P, TORII A, et al. NetVLAD: CNN architecture for weakly supervised place recognition [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 5297–5307.
2	YANDEX A B, LEMPITSKY V. Aggregating local deep features for image retrieval [C]// IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1269–1277.
3	BABENKO A, SLESAREV A, CHIGORIN A, et al. Neural codes for image retrieval [C]// European Conference on Computer Vision. Zurich: Springer Nature Publishing, 2014: 584−599.
4	HE K, LU Y, SCLAROFF S. Local descriptors optimized for average precision [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 596–605.
5	NOH H, ARAUJO A, SIM J, et al. Large-scale image retrieval with attentive deep local features [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 3476–3485.
6	REVAUD J, DE SOUZA C, HUMENBERGER M, et al R2d2: reliable and repeatable detector and descriptor[J]. Advances in Neural Information Processing Systems, 2019, 32: 12405- 12415
7	SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 815–823.
8	HU J, LU J, TAN Y P. Discriminative deep metric learning for face verification in the wild [C]// IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1875–1882.
9	DENG J, GUO J, XUE N, et al. ArcFace: additive angular margin loss for deep face recognition [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4690–4699.
10	KALANTIDIS Y, MELLINA C, OSINDERO S. Cross-dimensional weighting for aggregated deep convolutional features [C]// European Conference on Computer Vision. Amsterdam: Springer International Publishing, 2016: 685–701.
11	TOLIAS G, SICRE R, JÉGOU H. Particular object retrieval with integral max-pooling of CNN activations [EB/OL]. (2016−02−24)[2023−10−13]. https://arxiv.org/abs/1511.05879.
12	RADENOVIĆ F, TOLIAS G, CHUM O Fine-tuning CNN image retrieval with No human annotation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41 (7): 1655- 1668 doi: 10.1109/TPAMI.2018.2846566
13	SHAO S, CHEN K, KARPUR A, et al. Global features are all you need for image retrieval and reranking [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 11002–11012.
14	NG T, BALNTAS V, TIAN Y, et al. SOLAR: second-order loss and attention for image retrieval [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 253−270.
15	WU H, WANG M, ZHOU W, et al. Learning token-based representation for image retrieval [C]// AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2022, 36(3): 2703−2711.
16	YANG M, HE D, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 11752–11761.
17	SONG C H, YOON J, CHOI S, et al. Boosting vision transformers for image retrieval [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 107–117.
18	LOWE D G Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2): 91- 110 doi: 10.1023/B:VISI.0000029664.99615.94
19	BAY H, ESS A, TUYTELAARS T, et al Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110 (3): 346- 359 doi: 10.1016/j.cviu.2007.09.014
20	DUSMANU M, ROCCO I, PAJDLA T, et al. D2-net: a trainable cnn for joint detection and description of local features [EB/OL]. (2019−05−09)[2023−11−28]. https://arxiv.org/abs/1905.03561.
21	TOLIAS G, JENICEK T, CHUM O. Learning and aggregating deep local descriptors for instance-level recognition [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 460−477.
22	WEINZAEPFEL P, LUCAS T, LARLUS D, et al. Learning super-features for image retrieval [EB/OL]. (2022−01−31)[2023−12−17]. https://arxiv.org/abs/2201.13182.
23	TAN F, YUAN J, ORDONEZ V. Instance-level image retrieval using reranking transformers [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 12085–12095.
24	LEE S, SEONG H, LEE S, et al. Correlation verification for image retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5364–5374.
25	KANG D, KWON H, MIN J, et al. Relational embedding for few-shot classification [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 8802–8813.
26	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module [C]// European Conference on Computer Vision. Munich: Springer International Publishing, 2018: 3−19.
27	GORDO A, RADENOVIC F, BERG T. Attention-based query expansion learning [C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 172−188.
28	ARANDJELOVIĆ R, ZISSERMAN A. Three things everyone should know to improve object retrieval [C]// IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2911–2918.
29	GORDO A, ALMAZÁN J, REVAUD J, et al End-to-end learning of deep visual representations for image retrieval[J]. International Journal of Computer Vision, 2017, 124 (2): 237- 254 doi: 10.1007/s11263-017-1016-8
30	WEYAND T, ARAUJO A, CAO B, et al. Google landmarks dataset v2–A large-scale benchmark for instance-level recognition and retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 2575–2584.
31	RADENOVIC F, ISCEN A, TOLIAS G, et al. Revisiting Oxford and paris: large-scale image retrieval benchmarking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5706–5715.
32	PHILBIN J, CHUM O, ISARD M, et al. Object retrieval with large vocabularies and fast spatial matching [C]// IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis: IEEE, 2007: 1–8.
33	CAO B, ARAUJO A, SIM J. Unifying deep local and global features for image search [C]// European Conference on Computer Vision. Glasgow: Springer International Publishing, 2020: 726−743.
34	SONG C H, HAN H J, AVRITHIS Y. All the attention you need: global-local, spatial-channel attention for image retrieval [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022: 439–448.
35	SONG Y, ZHU R, YANG M, et al. Dalg: Deep attentive local and global modeling for image retrieval [EB/OL]. (2022−07−01)[2024−03−11]. https://arxiv.org/abs/2207.00287.
36	ZHANG Z, WANG L, ZHOU L, et al. Learning spatial-context-aware global visual feature representation for instance image retrieval [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 11216–11225.
37	LEE S, LEE S, SEONG H, et al. Revisiting self-similarity: structural embedding for image retrieval [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 23412–23421.
38	TOLIAS G, AVRITHIS Y, JÉGOU H Image search with selective match kernels: aggregation across single and multiple images[J]. International Journal of Computer Vision, 2016, 116 (3): 247- 261 doi: 10.1007/s11263-015-0810-4
39	TEICHMANN M, ARAUJO A, ZHU M, et al. Detect-to-retrieve: efficient regional aggregation for image search [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5109–5118.

[1]	杨帆,宁博,李怀清,周新,李冠宇. 基于语义增强特征融合的多模态图像检索模型[J]. 浙江大学学报(工学版), 2023, 57(2): 252-258.
[2]	凤丽洲,杨阳,王友卫,杨贵军. 基于Transformer和知识图谱的新闻推荐新方法[J]. 浙江大学学报(工学版), 2023, 57(1): 133-143.
[3]	张师林,马思明,顾子谦. 基于大边距度量学习的车辆再识别方法[J]. 浙江大学学报(工学版), 2021, 55(5): 948-956.
[4]	王金德, 寿黎但, 李晓燕, 陈刚. 基于多重分割捆绑特征的目标图像检索[J]. J4, 2011, 45(2): 259-266.
[5]	黄鹏, 陈纯, 王灿, 等. 使用加权图像标注改进Web图像检索[J]. J4, 2009, 43(12): 2129-2135.
[6]	谭征宇孙守迁. 基于意象认知模型的图像检索技术[J]. J4, 2008, 42(5): 763-767.

Viewed

Full text

Abstract

Cited

Shared

Discussed