基于时空特征增强的单目标跟踪算法

doi:10.3785/j.issn.1008-973X.2025.11.021

浙江大学学报(工学版)

2025, Vol. 59

Issue (11): 2418-2429 DOI: 10.3785/j.issn.1008-973X.2025.11.021

计算机技术

基于时空特征增强的单目标跟踪算法

顾磊(

),夏楠*(

),江佳鸿,廉筱峪

大连工业大学信息科学与工程学院，辽宁大连 116034

Single object tracking algorithm based on spatio-temporal feature enhancement

Lei GU(

),Nan XIA*(

),Jiahong JIANG,Xiaoyu LIAN

School of Information Science and Engineering, Dalian Polytechnic University, Dalian 116034, China

全文: PDF(4285 KB) HTML

摘要：

针对复杂运动场景中常见的遮挡和尺度变化问题，为了提升单目标跟踪算法在时间特征信息利用和目标空间特征表达上的综合能力，在单流跟踪网络OSTrack的基础上，提出基于时空特征增强的单目标跟踪算法OSTrack-ST.在空间特征增强方面，提出包含空间注意力和多头上下文关联注意力的多头空间关联注意力机制，增强模型对空间全局特征和局部特征的表达能力，有效提升模型在动态环境中对目标特征的捕获能力；在时空特征增强方面，提出基于时序漂移预测的时空模板更新策略，利用空间位置预测结果来控制时序模板更新，提升模型在长时序任务中的鲁棒性和准确性.实验结果表明，所提算法在LaSOT、GOT-10k和SportSOT数据集上的跟踪成功率分别达到了70.5%、73.7%和68.7%，运行速度超过49帧/s. 此算法的综合性能优于EVPTrack等其他跟踪算法.

关键词： 目标跟踪; 孪生网络; 时空增强; 注意力机制; 模板更新

Abstract:

A single object tracking algorithm based on spatio-temporal feature enhancement (OSTrack-ST), which was built on the one-stream tracking network OSTrack, was proposed to address the common issues of occlusion and scale variation in complex motion scenes and enhance the performance of single object tracking algorithms in utilizing temporal feature information and expressing object spatial features. For spatial feature enhancement, a multi-head spatial association attention mechanism including the spatial attention and the multi-head context association attention was proposed to enhance the model’s ability to express global and local spatial features, and effectively improve the model’s ability to capture object features in dynamic environments. For spatio-temporal feature enhancement, a spatio-temporal template update strategy based on temporal drift prediction was proposed, which used spatial position prediction results to control template updates over time and enhanced the robustness and accuracy of the model in long-term sequential tasks. Experimental results demonstrated that the proposed algorithm achieved tracking success rates of 70.5%, 73.7% and 68.7% on the LaSOT, GOT-10k and SportSOT datasets while the running speed was over 49 frame per second. The overall performance of this algorithm was better than that of other tracking algorithms such as EVPTrack.

Key words: object tracking Siamese network spatio-temporal enhancement attention mechanism template update

收稿日期: 2025-01-11 出版日期: 2025-10-30

TP 391

基金资助: 教育部产学合作协同育人资助项目（220603231024713）.

通讯作者: 夏楠 E-mail: 220520854000562@xy.dlpu.edu.cn;xianan@dlpu.edu.cn

作者简介: 顾磊（2000—），男，硕士生，从事视频目标跟踪研究. orcid.org/0009-0008-7238-1689. E-mail：220520854000562@xy.dlpu.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	顾磊
	夏楠
	江佳鸿
	廉筱峪

引用本文:

顾磊,夏楠,江佳鸿,廉筱峪. 基于时空特征增强的单目标跟踪算法[J]. 浙江大学学报(工学版), 2025, 59(11): 2418-2429.

Lei GU,Nan XIA,Jiahong JIANG,Xiaoyu LIAN. Single object tracking algorithm based on spatio-temporal feature enhancement. Journal of ZheJiang University (Engineering Science), 2025, 59(11): 2418-2429.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.11.021 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I11/2418

图 1 基于时空特征增强的单目标跟踪算法的整体网络架构

图 2 空间注意力模块的结构图

图 3 多头上下文关联注意力模块结构图

图 4 时空模板更新策略结构图

表 1 不同算法在LaSOT和GOT-10k测试集上的跟踪结果对比

表 2 不同算法在SportsSOT测试集上的跟踪结果对比

表 3 基准测试集LaSOT上所提算法与其他跟踪算法的实时效率比较

图 5 不同算法在LaSOT数据集的不同属性场景中的AUC性能

表 4 MixViT和OSTrack算法的消融实验结果

图 6 MixViT和OSTrack采取不同改进策略后的特征图比较

图 7 复杂场景中不同算法的跟踪结果对比

1	JAVED S, DANELLJAN M, KHAN F S, et al Visual object tracking with discriminative filters and Siamese networks: a survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (5): 6552- 6574
2	孙训红, 都海波, 陈维乐, 等基于移动机器人机载视觉云台的有限时间目标跟踪控制[J]. 控制与决策, 2023, 38 (10): 2875- 2880 SUN Xunhong, DU Haibo, CHEN Weile, et al Finite-time target tracking control based on mobile robot’s onboard PanTilt-Zoom camera system[J]. Control and Decision, 2023, 38 (10): 2875- 2880
3	江佳鸿, 夏楠, 李长吾, 等基于多尺度增量学习的单人体操动作中关键点检测方法[J]. 电子学报, 2024, 52 (5): 1730- 1742 JIANG Jiahong, XIA Nan, LI Changwu, et al Keypoint detection method for single person gymnastics actions based on multi-scale incremental learning[J]. Acta Electronica Sinica, 2024, 52 (5): 1730- 1742
4	MARVASTI-ZADEH S M, CHENG L, GHANEI-YAKHDAN H, et al Deep learning for visual tracking: a comprehensive survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (5): 3943- 3968 doi: 10.1109/TITS.2020.3046478
5	卢湖川, 李佩霞, 王栋目标跟踪算法综述[J]. 模式识别与人工智能, 2018, 31 (1): 61- 76 LU Huchuan, LI Peixia, WANG Dong Visual object tracking: a survey[J]. Pattern Recognition and Artificial Intelligence, 2018, 31 (1): 61- 76
6	DU S, WANG S An overview of correlation-filter-based object tracking[J]. IEEE Transactions on Computational Social Systems, 2022, 9 (1): 18- 31 doi: 10.1109/TCSS.2021.3093298
7	张津浦, 王岳环融合检测技术的孪生网络跟踪算法综述[J]. 红外与激光工程, 2022, 51 (10): 1- 14 ZHANG Jinpu, WANG Yuehuan A survey of Siamese networks tracking algorithm integrating detection technology[J]. Infrared and Laser Engineering, 2022, 51 (10): 1- 14
8	LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8971–8980.
9	XU Y, WANG Z, LI Z, et al. SiamFC++: towards robust and accurate visual tracking with target estimation guidelines [C]// Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 12549–12556.
10	CHEN D, TANG F, DONG W, et al SiamCPN: visual tracking with the Siamese center-prediction network[J]. Computational Visual Media, 2021, 7 (2): 253- 265 doi: 10.1007/s41095-021-0212-1
11	ZHANG L, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers [C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 4009–4018.
12	SARIBAS H, CEVIKALP H, KÖPÜKLÜ O, et al TRAT: tracking by attention using spatio-temporal features[J]. Neurocomputing, 2022, 492 (1): 150- 161
13	MAYER C, DANELLJAN M, PANI PAUDEL D, et al. Learning target candidate association to keep track of what not to track [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 13424–13434.
14	王蒙蒙, 杨小倩, 刘勇利用时空特征编码的单目标跟踪网络[J]. 中国图象图形学报, 2022, 27 (9): 2733- 2748 WANG Mengmeng, YANG Xiaoqian, LIU Yong A spatio-temporal encoded network for single object tracking[J]. Journal of Image and Graphics, 2022, 27 (9): 2733- 2748 doi: 10.11834/jig.211157
15	HAN K, WANG Y, CHEN H, et al A survey on vision Transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (1): 87- 110 doi: 10.1109/TPAMI.2022.3152247
16	CHEN X, YAN B, ZHU J, et al. Transformer tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 8122–8131.
17	YAN B, PENG H, FU J, et al. Learning spatio-temporal Transformer for visual tracking [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10428–10437.
18	GAO S, ZHOU C, MA C, et al. AiATrack: attention in attention for Transformer visual tracking [C]// European Conference on Computer Vision. Tel Aviv: Springer, 2022: 146–164.
19	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
20	WANG N, ZHOU W, WANG J, et al. Transformer meets tracker: exploiting temporal context for robust visual tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 1571–1580.
21	LING L T, FAN H, ZHANG Z P, et al. SwinTrack: a simple and strong baseline for Transformer tracking [C]// Conference on Neural Information Processing Systems. New Orleans: [s. n.], 2022: 16743–16754.
22	侯志强, 杨晓麟, 马素刚, 等基于特征增强和历史帧选择的Transformer视觉跟踪算法[J]. 控制与决策, 2024, 39 (10): 3506- 3512 HOU Zhiqiang, YANG Xiaolin, MA Sugang, et al Feature enhancement and history frame selection based Transformer visual tracking[J]. Control and Decision, 2024, 39 (10): 3506- 3512
23	CUI Y, JIANG C, WU G, et al MixFormer: end-to-end tracking with iterative mixed attention[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (6): 4129- 4146 doi: 10.1109/TPAMI.2024.3349519
24	LIU C, ZHAO J, BO C, et al LGTrack: exploiting local and global properties for robust visual tracking[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (9): 8161- 8171 doi: 10.1109/TCSVT.2024.3390054
25	SHI L, ZHONG B, LIANG Q, et al. Explicit visual prompts for visual object tracking [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2024: 4838–4846.
26	YE B, CHANG H, MA B, et al. Joint feature learning and relation modeling for tracking: a one-stream framework [C]// European Conference on Computer Vision. Tel Aviv: Springer, 2022: 341–357.
27	WANG Y, DENG L, ZHENG Y, et al Temporal convolutional network with soft thresholding and attention mechanism for machinery prognostics[J]. Journal of Manufacturing Systems, 2021, 60 (1): 512- 526
28	FAN H, LIN L T, YANG F, et al. LaSOT: a high-quality benchmark for large-scale single object tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5369–5378.
29	HUANG L, ZHAO X, HUANG K GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (5): 1562- 1577 doi: 10.1109/TPAMI.2019.2957464
30	CUI Y, ZENG C, ZHAO X, et al. SportsMOT: a large multi-object tracking dataset in multiple sports scenes [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 9887–9897.
31	HE K, CHEN X, XIE S, et al. Masked autoencoders are scalable vision learners [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 15979–15988.

[1]	翟亚红,陈雅玲,徐龙艳,龚玉. 改进YOLOv8s的轻量级无人机航拍小目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1708-1717.
[2]	付家瑞,李兆飞,周豪,黄惟. 基于Convnextv2与纹理边缘引导的伪装目标检测[J]. 浙江大学学报(工学版), 2025, 59(8): 1718-1726.
[3]	张学军,梁书滨,白万荣,张奉鹤,黄海燕,郭梅凤,陈卓. 基于异构图表征的源代码漏洞检测方法[J]. 浙江大学学报(工学版), 2025, 59(8): 1644-1652.
[4]	林宜山,左景,卢树华. 基于多头自注意力机制与MLP-Interactor的多模态情感分析[J]. 浙江大学学报(工学版), 2025, 59(8): 1653-1661.
[5]	杨荣泰,邵玉斌,杜庆治. 基于结构感知的少样本知识补全[J]. 浙江大学学报(工学版), 2025, 59(7): 1394-1402.
[6]	杨宇豪,郭永存,李德永,王爽. 基于视觉信息的煤矸识别分割定位方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1421-1433.
[7]	王圣举,张赞. 基于加速扩散模型的缺失值插补算法[J]. 浙江大学学报(工学版), 2025, 59(7): 1471-1480.
[8]	蔡永青,韩成,权巍,陈兀迪. 基于注意力机制的视觉诱导晕动症评估模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1110-1118.
[9]	鞠文博,董华军. 基于上下文信息融合与动态采样的主板缺陷检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1159-1168.
[10]	周翔宇,刘毅志,赵肄江,廖祝华,张德城. 面向目的地预测的层次化空间嵌入BiGRU模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1211-1218.
[11]	李宗民,徐畅,白云,鲜世洋,戎光彩. 面向点云理解的双邻域图卷积方法[J]. 浙江大学学报(工学版), 2025, 59(5): 879-889.
[12]	吴晓佳,杨金龙,赵豪豪. 优化多核相关滤波的弱小目标检测前跟踪算法[J]. 浙江大学学报(工学版), 2025, 59(5): 947-955.
[13]	刘洪伟,王磊,刘阳,张鹏超,乔石. 基于重组二次分解及LSTNet-Atten的短期负荷预测[J]. 浙江大学学报(工学版), 2025, 59(5): 1051-1062.
[14]	刘登峰,郭文静,陈世海. 基于内容引导注意力的车道线检测网络[J]. 浙江大学学报(工学版), 2025, 59(3): 451-459.
[15]	姚明辉,王悦燕,吴启亮,牛燕,王聪. 基于小样本人体运动行为识别的孪生网络算法[J]. 浙江大学学报(工学版), 2025, 59(3): 504-511.

Viewed

Full text

Abstract

Cited

Shared

Discussed