Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (1): 61-70    DOI: 10.3785/j.issn.1008-973X.2026.01.006
    
Zero-shot memory-aware selection visual tracking model for unmanned driving
Jie LI1(),Shimin WANG1,Changcheng WANG2,Yafeng CUI2,Junjie WANG3,Weijia ZHOU1,Zheng HU2,Hai LAN2,Ling DU2,Meng GAO2
1. School of Mechanical-electronic and Automobile Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
2. China North Vehicle Research Institute, Beijing 100072, China
3. Chongqing University of Arts and Sciences, Chongqing 402160, China
Download: HTML     PDF(4722KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A zero-shot visual tracking model was proposed to ensure that the unmanned vehicles maintained high tracking accuracy even when the target was deformed or partially/fully occluded. Based on the classical Kalman filtering, a motion modeling module was introduced during the mask prediction phase and the predicted masks were iteratively refined by considering temporal-spatial consistency and integrating motion cues. A hybrid scoring system was employed to select the optimal mask from the predicted masks. For the historical optimal masks, a memory-aware selection module was designed to create an ideal mask candidate library and dynamically choose the most suitable mask by combining the historical features and information cues. The performance of the proposed method and several classical visual tracking models such as HIPTrack-B384 was evaluated and compared on the LaSOT, GOT-10k, and OTB100 datasets. The results showed that, compared with the optimal values of the corresponding metrics in the comparison methods, the area under ROC curve (AUC), precision, average overlap, overlap precision at IoU thresholds 0.5 and 0.75, and success rate of the proposed model were improved by 2.87%, 2.73%, 2.84%, 3.18%, 5.46%, and 1.62%, respectively, indicating that the algorithm achieved good performance on multiple metrics.



Key wordsunmanned vehicle      visual tracking      motion modeling      hybrid scoring      memory-aware selection      zero-shot tracking     
Received: 12 March 2025      Published: 15 December 2025
CLC:  U 469.79  
Fund:  国家自然科学基金资助项目(51675494);高机动防暴车辆技术国家工程研究中心开放基金资助项目(2024NELEV001);北京建筑大学研究生创新资助项目(PG2025154).
Cite this article:

Jie LI,Shimin WANG,Changcheng WANG,Yafeng CUI,Junjie WANG,Weijia ZHOU,Zheng HU,Hai LAN,Ling DU,Meng GAO. Zero-shot memory-aware selection visual tracking model for unmanned driving. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 61-70.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.01.006     OR     https://www.zjujournals.com/eng/Y2026/V60/I1/61


面向无人驾驶的零样本记忆感知选择视觉跟踪模型

为了保证无人驾驶车辆在遇到目标变形、被部分或完全遮挡等情况时仍然具有较高的跟踪准确性,构建零样本视觉跟踪模型. 以经典卡尔曼滤波为基础,在掩码预测阶段加入运动建模模块,考虑时间和空间的一致性并结合运动线索,对预测掩码进行循环校正. 采用混合评分系统,从预测掩码中选择最优掩码. 对于历史最优掩码,设计记忆感知选择模块,创建理想掩码候选库,并结合历史特征和信息线索,动态选择最合适的掩码. 在LaSOT、GOT-10k和OTB100数据集上对所提模型与HIPTrack-B384等多个经典视觉跟踪模型的性能进行评估和对比,结果表明,所提模型的ROC曲线下面积(AUC)、精度、平均重叠度、交并比阈值0.50和0.75对应的重叠精度和成功率相比于对比方法中各指标的最优值分别提升了2.87%、2.73%、2.84%、3.18%、5.46%和1.62%,表明算法在多个指标上具有较好的性能.


关键词: 无人汽车,  视觉跟踪,  运动建模,  混合评分,  记忆感知选择,  零样本跟踪 
Fig.1 Flow framework of zero-shot visual tracking model
算法LaSOTGOT-10kOTB100
$ {P}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}} $/%AUC/%P/%AO/%$ {\mathrm{O}\mathrm{P}}_{0.50} $/%$ {\mathrm{O}\mathrm{P}}_{0.75} $/%$ {S}_{\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{e}} $/%AUC/%P/%
HIPTrack-B384[28]82.972.779.577.488.074.579.271.080.2
AQATrack-B256[29]81.971.478.673.883.272.176.472.883.1
ODTrack-B384[30]83.273.280.677.087.975.175.673.081.8
LoRAT-B224[31]80.971.777.372.184.975.080.472.382.5
OSTrack384[15]81.171.177.673.783.270.877.655.975.8
SiamPRN++[32]56.949.649.151.761.632.572.969.277.6
DiMP288[33]64.156.356.061.171.749.266.474.378.4
Zero-shot82.775.382.879.690.879.281.774.884.9
Tab.1 Comparison of visual object tracking results of various algorithms on different datasets
Fig.2 Visual comparison of tracking results of proposed algorithm and HIPTrack-B384 algorithm
运动建模记忆感知选择AUC/%$ {P}_{\mathrm{n}\mathrm{o}\mathrm{r}\mathrm{m}}/ $%P/%Np/MFLOPs/(1012·s?1)FPS/(帧·s?1)
××68.3276.1673.593.825.078
×70.8178.8776.474.016.594
×72.6780.6778.234.214.780
74.2382.6980.214.312.9116
Tab.2 Quantitative ablation experiment results of motion modeling and memory perceptual selection modules
Fig.3 Qualitative ablation experiment results of motion modeling and memory perceptual selection modules
Fig.4 Ablation experiment response map of motion modeling and memory perceptual selection modules
[1]   于明鑫, 王长龙, 张玉华, 等 复杂环境下视觉目标跟踪研究现状及发展[J]. 航空兵器, 2024, 31 (3): 40- 50
YU Mingxin, WANG Changlong, ZHANG Yuhua, et al Survey of visual tracking algorithms in the complex scenarios[J]. Aero Weaponry, 2024, 31 (3): 40- 50
doi: 10.12132/ISSN.1673-5048.2023.0112
[2]   侯志强, 赵佳鑫, 陈语, 等 用于长时视觉跟踪的级联目标漂移判定网络[J]. 北京航空航天大学学报, 2025, 51 (7): 2240- 2252
HOU Zhiqiang, ZHAO Jiaxin, CHEN Yu, et al Cascaded object drift determination network for long-term visual tracking[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51 (7): 2240- 2252
[3]   马庆禄, 王伟, 孙枭, 等 隧道火灾的视觉跟踪算法[J]. 东南大学学报: 自然科学版, 2025, 55 (1): 255- 265
MA Qinglu, WANG Wei, SUN Xiao, et al Visual tracking algorithm for tunnel fire[J]. Journal of Southeast University: Natural Science Edition, 2025, 55 (1): 255- 265
[4]   魏超, 吴西涛, 朱耿霆, 等 基于视觉相机和激光雷达融合的无人车障碍物检测与跟踪研究[J]. 机械工程学报, 2025, 61 (2): 296- 309
WEI Chao, WU Xitao, ZHU Gengting, et al Research on obstacle detection and tracking of autonomous vehicles based on the fusion of vision camera and LiDAR[J]. Journal of Mechanical Engineering, 2025, 61 (2): 296- 309
[5]   DANELLJAN M, HÄGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking [C]// Proceedings of the IEEE International Conference on Computer Vision Workshop. Santiago: IEEE, 2015: 621–629.
[6]   DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking [C]// Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 4310–4318.
[7]   BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking [C]// Proceedings of the European Conference on Computer Vision. Amsterdam: Springer, 2016: 850–865.
[8]   CHEN X, YAN B, ZHU J, et al. Transformer tracking [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 8122–8131.
[9]   XU T, ZHANG P, HUANG Q, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1316–1324.
[10]   ZHU M, ZHANG H, ZHANG J, et al Multi-level prediction Siamese network for real-time UAV visual tracking[J]. Image and Vision Computing, 2020, 103: 104002
doi: 10.1016/j.imavis.2020.104002
[11]   YUAN Y, WANG D, WANG Q. Memory-augmented temporal dynamic learning for action recognition [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2019: 9167–9175.
[12]   WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: a unifying approach [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 1328–1338.
[13]   LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8971–8980.
[14]   YE B, CHANG H, MA B, et al. Joint feature learning and relation modeling for tracking: a one-stream framework [C]// Proceedings of the European Conference on Computer Vision. Tel Aviv: Springer, 2022: 341–357.
[15]   YU Y, XIONG Y, HUANG W, et al. Deformable Siamese attention networks for visual object tracking [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 6727–6736.
[16]   雷帮军, 丁奇帅, 牟乾西, 等. 基于模板更新和双特征增强的视觉跟踪算法[J/OL]. 北京航空航天大学学报, 2024: 1–15. (2024-02-27). https://link.cnki.net/doi/10.13700/j.bh.1001-5965.2024.0020.
LEI Bangjun, DING Qishuai, MOU Qianxi, et al. Visual tracking algorithm based on template updating and dual feature enhancement [J/OL]. Journal of Beijing University of Aeronautics and Astronautics, 2024: 1–15. (2024-02-27). https://link.cnki.net/doi/10.13700/j.bh.1001-5965.2024.0020.
[17]   黄煜杰, 陈凯, 王子源, 等 多目视觉下基于融合特征的密集行人跟踪方法[J]. 北京航空航天大学学报, 2025, 51 (7): 2513- 2525
HUANG Yujie, CHEN Kai, WANG Ziyuan, et al A dense pedestrian tracking method based on fusion features under multi-vision[J]. Journal of Beijing University of Aeronautics and Astronautics, 2025, 51 (7): 2513- 2525
[18]   侯志强, 王卓, 马素刚, 等 长时视觉跟踪中基于双模板Siamese结构的目标漂移判定网络[J]. 电子与信息学报, 2024, 46 (4): 1458- 1467
HOU Zhiqiang, WANG Zhuo, MA Sugang, et al Target drift discriminative network based on dual-template Siamese structure in long-term tracking[J]. Journal of Electronics & Information Technology, 2024, 46 (4): 1458- 1467
doi: 10.11999/JEIT230496
[19]   侯志强, 陈茂林, 马靖媛, 等 基于二阶注意力的Siamese网络视觉跟踪算法[J]. 北京航空航天大学学报, 2024, 50 (3): 739- 747
HOU Zhiqiang, CHEN Maolin, MA Jingyuan, et al Siamese network visual tracking algorithm based on second-order attention[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (3): 739- 747
[20]   罗彪, 欧阳志华, 易昕宁, 等 基于自适应动态规划的移动机器人视觉伺服跟踪控制[J]. 自动化学报, 2023, 49 (11): 2286- 2296
LUO Biao, OUYANG Zhihua, YI Xinning, et al Adaptive dynamic programming based visual servoing tracking control for mobile robots[J]. Acta Automatica Sinica, 2023, 49 (11): 2286- 2296
[21]   侯志强, 马靖媛, 韩若雪, 等 基于深度学习的快速长时视觉跟踪算法[J]. 北京航空航天大学学报, 2024, 50 (8): 2391- 2403
HOU Zhiqiang, MA Jingyuan, HAN Ruoxue, et al A fast long-term visual tracking algorithm based on deep learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (8): 2391- 2403
[22]   华夏, 王新晴, 芮挺, 等 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报: 工学版, 2022, 56 (7): 1464- 1472
HUA Xia, WANG Xinqing, RUI Ting, et al Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (7): 1464- 1472
[23]   甘耀东, 郑玲, 张志达, 等. 融合毫米波雷达与深度视觉的多目标检测与跟踪[J]. 汽车工程, 2021, 43(7): 1022–1029.
GAN Yaodong, ZHENG Ling, ZHANG Zhida, et al. Multi-target detection and tracking with fusion of millimeter-wave radar and deep vision [J]. Automotive Engineering, 2021, 43(7): 1022–1029.
[24]   仇祝令, 查宇飞, 吴敏, 等 基于注意力学习的正则化相关滤波跟踪算法[J]. 电子学报, 2020, 48 (9): 1762- 1768
QIU Zhuling, ZHA Yufei, WU Min, et al Learning attentional regularized correlation filter for visual tracking[J]. Acta Electronica Sinica, 2020, 48 (9): 1762- 1768
doi: 10.3969/j.issn.0372-2112.2020.09.014
[25]   FAN H, LIN L, YANG F, et al. LaSOT: a high-quality benchmark for large-scale single object tracking [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5369–5378.
[26]   HUANG L, ZHAO X, HUANG K GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (5): 1562- 1577
doi: 10.1109/TPAMI.2019.2957464
[27]   WU Y, LIM J, YANG M H. Online object tracking: a benchmark [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2411–2418.
[28]   CAI W, LIU Q, WANG Y. HIPTrack: visual tracking with historical prompts [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 19258–19267.
[29]   XIE J, ZHONG B, MO Z, et al. Autoregressive queries for adaptive tracking with spatio-temporal transformers [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 19300–19309.
[30]   ZHENG Y, ZHONG B, LIANG Q, et al ODTrack: online dense temporal token learning for visual tracking[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38 (7): 7588- 7596
doi: 10.1609/aaai.v38i7.28591
[31]   LIN L, FAN H, ZHANG Z, et al. Tracking meets LoRA: faster training, larger model, stronger performance [C]//Proceedings of the European Conference on Computer Vision. Milan: Springer, 2024: 300–318.
[32]   LI B, WU W, WANG Q, et al. SiamRPN: evolution of Siamese visual tracking with very deep networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4277–4286.
[1] Peng SONG,De-dong YANG,Chang LI,Chang GUO. An adaptive siamese network tracking algorithm based on global feature channel recognition[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(5): 966-975.
[2] Yan-wei ZHAO,Jian ZHANG,Xian-ming ZHOU,Geng-yu WU. Dynamic tracking and precise landing of UAV based on visual magnetic guidance[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 96-108.
[3] Chang-zhen XIONG,Yan LU,Jia-qing YAN. Visual tracking algorithm based on anisotropic Gaussian distribution[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(2): 301-310.
[4] Chang-zhen XIONG,Run-ling WANG,Jian-cheng ZOU. Real-time tracking algorithm based on multiple Gaussian-distribution correlation filters[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(8): 1488-1495.
[5] YU Hui-min, ZENG Xiong. Visual tracking combined with ranking vector SVM[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(6): 1015-1021.