|
|
|
| Single object tracking algorithm based on spatio-temporal feature enhancement |
Lei GU( ),Nan XIA*( ),Jiahong JIANG,Xiaoyu LIAN |
| School of Information Science and Engineering, Dalian Polytechnic University, Dalian 116034, China |
|
|
|
Abstract A single object tracking algorithm based on spatio-temporal feature enhancement (OSTrack-ST), which was built on the one-stream tracking network OSTrack, was proposed to address the common issues of occlusion and scale variation in complex motion scenes and enhance the performance of single object tracking algorithms in utilizing temporal feature information and expressing object spatial features. For spatial feature enhancement, a multi-head spatial association attention mechanism including the spatial attention and the multi-head context association attention was proposed to enhance the model’s ability to express global and local spatial features, and effectively improve the model’s ability to capture object features in dynamic environments. For spatio-temporal feature enhancement, a spatio-temporal template update strategy based on temporal drift prediction was proposed, which used spatial position prediction results to control template updates over time and enhanced the robustness and accuracy of the model in long-term sequential tasks. Experimental results demonstrated that the proposed algorithm achieved tracking success rates of 70.5%, 73.7% and 68.7% on the LaSOT, GOT-10k and SportSOT datasets while the running speed was over 49 frame per second. The overall performance of this algorithm was better than that of other tracking algorithms such as EVPTrack.
|
|
Received: 11 January 2025
Published: 30 October 2025
|
|
|
| Fund: 教育部产学合作协同育人资助项目(220603231024713). |
|
Corresponding Authors:
Nan XIA
E-mail: 220520854000562@xy.dlpu.edu.cn;xianan@dlpu.edu.cn
|
基于时空特征增强的单目标跟踪算法
针对复杂运动场景中常见的遮挡和尺度变化问题,为了提升单目标跟踪算法在时间特征信息利用和目标空间特征表达上的综合能力,在单流跟踪网络OSTrack的基础上,提出基于时空特征增强的单目标跟踪算法OSTrack-ST.在空间特征增强方面,提出包含空间注意力和多头上下文关联注意力的多头空间关联注意力机制,增强模型对空间全局特征和局部特征的表达能力,有效提升模型在动态环境中对目标特征的捕获能力;在时空特征增强方面,提出基于时序漂移预测的时空模板更新策略,利用空间位置预测结果来控制时序模板更新,提升模型在长时序任务中的鲁棒性和准确性.实验结果表明,所提算法在LaSOT、GOT-10k和SportSOT数据集上的跟踪成功率分别达到了70.5%、73.7%和68.7%,运行速度超过49帧/s. 此算法的综合性能优于EVPTrack等其他跟踪算法.
关键词:
目标跟踪,
孪生网络,
时空增强,
注意力机制,
模板更新
|
|
| [1] |
JAVED S, DANELLJAN M, KHAN F S, et al Visual object tracking with discriminative filters and Siamese networks: a survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (5): 6552- 6574
|
|
|
| [2] |
孙训红, 都海波, 陈维乐, 等 基于移动机器人机载视觉云台的有限时间目标跟踪控制[J]. 控制与决策, 2023, 38 (10): 2875- 2880 SUN Xunhong, DU Haibo, CHEN Weile, et al Finite-time target tracking control based on mobile robot’s onboard PanTilt-Zoom camera system[J]. Control and Decision, 2023, 38 (10): 2875- 2880
|
|
|
| [3] |
江佳鸿, 夏楠, 李长吾, 等 基于多尺度增量学习的单人体操动作中关键点检测方法[J]. 电子学报, 2024, 52 (5): 1730- 1742 JIANG Jiahong, XIA Nan, LI Changwu, et al Keypoint detection method for single person gymnastics actions based on multi-scale incremental learning[J]. Acta Electronica Sinica, 2024, 52 (5): 1730- 1742
|
|
|
| [4] |
MARVASTI-ZADEH S M, CHENG L, GHANEI-YAKHDAN H, et al Deep learning for visual tracking: a comprehensive survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (5): 3943- 3968
doi: 10.1109/TITS.2020.3046478
|
|
|
| [5] |
卢湖川, 李佩霞, 王栋 目标跟踪算法综述[J]. 模式识别与人工智能, 2018, 31 (1): 61- 76 LU Huchuan, LI Peixia, WANG Dong Visual object tracking: a survey[J]. Pattern Recognition and Artificial Intelligence, 2018, 31 (1): 61- 76
|
|
|
| [6] |
DU S, WANG S An overview of correlation-filter-based object tracking[J]. IEEE Transactions on Computational Social Systems, 2022, 9 (1): 18- 31
doi: 10.1109/TCSS.2021.3093298
|
|
|
| [7] |
张津浦, 王岳环 融合检测技术的孪生网络跟踪算法综述[J]. 红外与激光工程, 2022, 51 (10): 1- 14 ZHANG Jinpu, WANG Yuehuan A survey of Siamese networks tracking algorithm integrating detection technology[J]. Infrared and Laser Engineering, 2022, 51 (10): 1- 14
|
|
|
| [8] |
LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8971–8980.
|
|
|
| [9] |
XU Y, WANG Z, LI Z, et al. SiamFC++: towards robust and accurate visual tracking with target estimation guidelines [C]// Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 12549–12556.
|
|
|
| [10] |
CHEN D, TANG F, DONG W, et al SiamCPN: visual tracking with the Siamese center-prediction network[J]. Computational Visual Media, 2021, 7 (2): 253- 265
doi: 10.1007/s41095-021-0212-1
|
|
|
| [11] |
ZHANG L, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers [C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 4009–4018.
|
|
|
| [12] |
SARIBAS H, CEVIKALP H, KÖPÜKLÜ O, et al TRAT: tracking by attention using spatio-temporal features[J]. Neurocomputing, 2022, 492 (1): 150- 161
|
|
|
| [13] |
MAYER C, DANELLJAN M, PANI PAUDEL D, et al. Learning target candidate association to keep track of what not to track [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 13424–13434.
|
|
|
| [14] |
王蒙蒙, 杨小倩, 刘勇 利用时空特征编码的单目标跟踪网络[J]. 中国图象图形学报, 2022, 27 (9): 2733- 2748 WANG Mengmeng, YANG Xiaoqian, LIU Yong A spatio-temporal encoded network for single object tracking[J]. Journal of Image and Graphics, 2022, 27 (9): 2733- 2748
doi: 10.11834/jig.211157
|
|
|
| [15] |
HAN K, WANG Y, CHEN H, et al A survey on vision Transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (1): 87- 110
doi: 10.1109/TPAMI.2022.3152247
|
|
|
| [16] |
CHEN X, YAN B, ZHU J, et al. Transformer tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 8122–8131.
|
|
|
| [17] |
YAN B, PENG H, FU J, et al. Learning spatio-temporal Transformer for visual tracking [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10428–10437.
|
|
|
| [18] |
GAO S, ZHOU C, MA C, et al. AiATrack: attention in attention for Transformer visual tracking [C]// European Conference on Computer Vision. Tel Aviv: Springer, 2022: 146–164.
|
|
|
| [19] |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
|
|
|
| [20] |
WANG N, ZHOU W, WANG J, et al. Transformer meets tracker: exploiting temporal context for robust visual tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 1571–1580.
|
|
|
| [21] |
LING L T, FAN H, ZHANG Z P, et al. SwinTrack: a simple and strong baseline for Transformer tracking [C]// Conference on Neural Information Processing Systems. New Orleans: [s. n.], 2022: 16743–16754.
|
|
|
| [22] |
侯志强, 杨晓麟, 马素刚, 等 基于特征增强和历史帧选择的Transformer视觉跟踪算法[J]. 控制与决策, 2024, 39 (10): 3506- 3512 HOU Zhiqiang, YANG Xiaolin, MA Sugang, et al Feature enhancement and history frame selection based Transformer visual tracking[J]. Control and Decision, 2024, 39 (10): 3506- 3512
|
|
|
| [23] |
CUI Y, JIANG C, WU G, et al MixFormer: end-to-end tracking with iterative mixed attention[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (6): 4129- 4146
doi: 10.1109/TPAMI.2024.3349519
|
|
|
| [24] |
LIU C, ZHAO J, BO C, et al LGTrack: exploiting local and global properties for robust visual tracking[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (9): 8161- 8171
doi: 10.1109/TCSVT.2024.3390054
|
|
|
| [25] |
SHI L, ZHONG B, LIANG Q, et al. Explicit visual prompts for visual object tracking [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2024: 4838–4846.
|
|
|
| [26] |
YE B, CHANG H, MA B, et al. Joint feature learning and relation modeling for tracking: a one-stream framework [C]// European Conference on Computer Vision. Tel Aviv: Springer, 2022: 341–357.
|
|
|
| [27] |
WANG Y, DENG L, ZHENG Y, et al Temporal convolutional network with soft thresholding and attention mechanism for machinery prognostics[J]. Journal of Manufacturing Systems, 2021, 60 (1): 512- 526
|
|
|
| [28] |
FAN H, LIN L T, YANG F, et al. LaSOT: a high-quality benchmark for large-scale single object tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5369–5378.
|
|
|
| [29] |
HUANG L, ZHAO X, HUANG K GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (5): 1562- 1577
doi: 10.1109/TPAMI.2019.2957464
|
|
|
| [30] |
CUI Y, ZENG C, ZHAO X, et al. SportsMOT: a large multi-object tracking dataset in multiple sports scenes [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 9887–9897.
|
|
|
| [31] |
HE K, CHEN X, XIE S, et al. Masked autoencoders are scalable vision learners [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 15979–15988.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|