Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (10): 1945-1954    DOI: 10.3785/j.issn.1008-973X.2023.10.004
计算机技术、自动化技术     
注意力聚集无锚框的孪生网络无人机跟踪算法
王海军1(),马文来2,张圣燕1
1. 滨州学院 山东省高校航空信息与控制重点实验室,山东 滨州 256603
2. 南京航空航天大学 民航学院,江苏 南京 211106
Attention aggregation siamese network with anchor free scheme for UAV object tracking
Hai-jun WANG1(),Wen-lai MA2,Sheng-yan ZHANG1
1. Key Laboratory of Aviation Information and Control in University of Shandong, Binzhou University, Binzhou 256603, China
2. College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
 全文: PDF(2572 KB)   HTML
摘要:

针对无人机目标跟踪过程中经常出现角度变化、形变、相似物体干扰等问题,提出轻量级注意力聚集无锚框的孪生网络无人机实时目标跟踪算法. 考虑到无人机高空视角跟踪目标较小,在特征模板两分支中引入高效通道注意力机制,能够有效获取目标的语义信息和细节信息. 在融合两层响应的基础上,引入空间注意力机制,能够有效地聚合注意力特征,同时扩大模型的视野范围. 引入无锚框机制,针对每个像素进行分类和预测回归目标框,减少了模型复杂度,大大降低了计算量. 在UAV123@10fps、UAV20L和DTB70等无人机跟踪数据集上与多个当前比较流行的算法进行对比实验,结果表明,所提算法在3个无人机数据集上的平均跟踪速度达到155.2 帧/s,在多种复杂环境下,均能实现对目标的有效跟踪.

关键词: 无人机目标跟踪无锚框孪生网络通道注意力    
Abstract:

A real-time UAV object tracker based on lightweight and attentional aggregation siamese network with anchor free scheme was proposed, aiming at the problems of viewpoint change, deformation and similar objects around in UAV tracking tasks. Considering the small number of object pixels in the view of UAV high-altitude platform, an efficient channel attention scheme was introduced to the two branches of template. Then semantics information and detail information can be effectively extracted. A spatial attention scheme was constructed to effectively aggregate attention and enlarge the visual field range after fusing the response of two layer. An anchor free mechanism was built to directly classify and predict the object box on each pixel, which can simplify the complexity of model and reduce the calculation cost. The proposed method was conducted on three public UAV data sets such as UAV123@10fps、UAV20L and DTB70, and compared with other state-of-the-art tracking algorithms. The experimental results show that the proposed method can track the target effectively in many challenging scenes with an average speed of 155.2 frame per second on three UAV benchmarks.

Key words: unmanned aerial vehicle    object tracking    anchor free    siamese network    channel attention
收稿日期: 2022-12-09 出版日期: 2023-10-18
CLC:  TP 391.41  
基金资助: 山东省自然科学基金资助项目(ZR2020MF142);滨州学院博士启动基金资助项目(2021Y04);滨州学院重大科研基金资助项目(2019ZD03)
作者简介: 王海军(1980—),男,副教授,从事目标跟踪算法研究. orcid.org/0000-0003-2481-9662. E-mail: whjlym@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
王海军
马文来
张圣燕

引用本文:

王海军,马文来,张圣燕. 注意力聚集无锚框的孪生网络无人机跟踪算法[J]. 浙江大学学报(工学版), 2023, 57(10): 1945-1954.

Hai-jun WANG,Wen-lai MA,Sheng-yan ZHANG. Attention aggregation siamese network with anchor free scheme for UAV object tracking. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1945-1954.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.10.004        https://www.zjujournals.com/eng/CN/Y2023/V57/I10/1945

图 1  SiamFC算法的流程图
图 2  AASAF算法的流程图
图 3  通道注意力的结构图
图 4  空间注意力结构图
算法 P S
UAV123@10fps UAV20L DTB70 UAV123@10fps UAV20L DTB70
AASAF 0.755 0.744 0.809 0.570 0.571 0.608
SiamAPN 0.752 0.721 0.784 0.566 0.539 0.586
SESiam_FC 0.717 0.648 0.730 0.512 0.453 0.490
ECO 0.711 0.589 0.722 0.520 0.427 0.502
MCCT 0.684 0.605 0.725 0.492 0.407 0.484
DeepSTRCF 0.682 0.588 0.734 0.499 0.443 0.506
AutoTrack 0.671 0.512 0.716 0.477 0.349 0.478
ARCF 0.666 0.544 0.694 0.473 0.381 0.472
BiCF 0.662 0.486 0.657 0.475 0.356 0.462
Ocean 0.657 0.630 0.634 0.462 0.444 0.455
STRCF 0.627 0.575 0.649 0.457 0.411 0.437
表 1  11种跟踪算法在3个无人机数据库上的跟踪准确度和成功率对比
图 5  7种跟踪算法在3个无人机数据集上不同属性的跟踪准确度和成功率对比
图 6  不同算法的部分仿真结果
帧/s
数据集 AASAF SiamAPN DeepSTRCF MCCT ECO MCPF STRCF ARCF AutoTrack Ocean SESiam_FC
UAV123@10fps 154.0 162.5 5.4 8.0 15.7 0.5 22.2 21.9 44.8 88.3 41.2
UAV20L 147.9 139.0 6.2 8.4 12.7 0.6 26.9 24.1 47.6 89.8 41.1
DTB70 163.7 167.8 6.2 8.6 11.6 0.6 26.3 24.3 48.6 89.7 41.3
平均速度 155.2 156.4 5.9 8.3 13.3 0.57 25.1 23.4 47.0 89.3 41.2
表 2  不同算法在UAV123@10fps、UAV20L和DTB70数据集上的跟踪速度对比
不同模块组成算法 P S
UAV123@10fps UAV20L DTB70 UAV123@10fps UAV20L DTB70
baseline 0.734 0.686 0.768 0.555 0.518 0.589
baseline+通道注意力机制 0.752 0.692 0.798 0.567 0.523 0.605
baseline+空间注意力机制 0.738 0.708 0.794 0.555 0.537 0.589
baseline+通道注意力机制+空间注意力机制(AASAF算法) 0.755 0.744 0.809 0.570 0.571 0.608
表 3  基准算法与不同注意力机制结合在3个无人机数据库上的跟踪性能对比
方法 骨干网络 P S $v$/(帧·s?1)
Transt[1] RestNet50 0.819 0.631 39.9
TrDiMP[30] RestNet50 0.768 0.620 15.7
HCAT[31] RestNet18 0.676 0.510 104.1
SiamTPN[32] ShuffleNetV2 0.793 0.607 32.9
AASAF AlexNet 0.744 0.571 147.9
表 4  与基于Transformer跟踪方法在UAV20L数据集上的跟踪性能对比
1 CHEN X, YAN B, ZHU J, et al. Transformer tracking[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 8122-8131.
2 孙锐, 方林凤, 梁启丽, 等 孪生网络框架下融合显著性和干扰在线学习的航拍目标跟踪算法[J]. 电子与信息学报, 2021, 43 (5): 1414- 1423
SUN Rui, FANG Lin-feng, LIANG Qi-li, et al Siamese network combined learning saliency and online leaning interference for aerial object tracking algorithm[J]. Journal of Electronics and Information Technology, 2021, 43 (5): 1414- 1423
doi: 10.11999/JEIT200140
3 王海军, 张圣燕, 杜玉杰 响应和滤波器偏差感知约束的无人机目标跟踪算法[J]. 浙江大学学报: 工学版, 2022, 56 (9): 1824- 1832
WANG Hai-jun, ZHANG Sheng-yan, DU Yu-jie UAV object tracking algorithm based on response and filter deviation-aware regularization[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (9): 1824- 1832
4 刘芳, 王洪娟, 黄光伟, 等 基于自适应深度网络的无人机目标跟踪算法[J]. 航空学报, 2019, 40 (3): 322332
LIU Fang, WANG Hong-juan, HUANG Guang-wei, et al UAV target tracking algorithm based on adaptive depth network[J]. Acta Aeronautica et Astronautica Sinica, 2019, 40 (3): 322332
5 BOLME D S, BEVERIDGE J R, DRAPER B A, et al. Visual object tracking using adaptive correlation filters[C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010: 2544-2550.
6 HENRIQUES J F, CASEIRO R, MARTINS P, et al High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37 (3): 583- 596
doi: 10.1109/TPAMI.2014.2345390
7 HUANG Z, FU C, LI Y, et al. Learning aberrance repressed correlation filters for real-time UAV tracking [C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 2891-2900.
8 LI Y, FU C, DING F, et al. Auto track: towards high-performance visual tracking for UAV with automatic spatio-temporal regularization [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11920-11929.
9 LIN F, FU C, HE Y, et al. BiCF: learning bidirectional incongruity-aware correlation filter for efficient UAV object tracking [C]// IEEE International Conference on Robotics and Automation. Paris: IEEE, 2020: 2365-2371.
10 BERTINETTO L, VALMADRE J, HENRIQUES J, et al. Fully-convolutional siamese networks for object tracking [C]// Computer Vision–ECCV Workshops. Amsterdam: Springer, 2016: 850-865.
11 LI B, YAN J, WU W, et al. High performance visual tracking with siamese region proposal network [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 8971-8980.
12 LI B, WU W, WANG Q, et al. SiamRPN++: evolution of siamese visual tracking with very deep networks [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4277-4286.
13 XU Y, WANG Z, LI Z, et al. SiamFC++: towards robust and accurate visual tracking with target estimation guidelines [C]// Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 12549-12556.
14 WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531-11539.
15 RAHMAN M M, FIAZ M, JUNG S K Efficient visual tracking with stacked channel-spatial attention learning[J]. IEEE Access, 2020, 8: 100857- 100869
doi: 10.1109/ACCESS.2020.2997917
16 WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Computer Vision–ECCV. Munich: Springer, 2018: 3-19.
17 FU J, LIU J, LI Y, et al. Dual attention network for scene segmentation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3141-3149.
18 LIN T, MAIRE M, BELONGIE S, et al. Microsoft Coco: common objects in context [C]// Computer Vision- ECCV. Zurich: Springer, 2014: 740-755.
19 RUSSAKOVSKY O, DENG J, SU H, et al ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115: 211- 252
doi: 10.1007/s11263-015-0816-y
20 HUANG L, ZHAO X, HUANG K, GOT-10k: a large high-diversity benchmark for generic object tracking in the wild [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577.
21 REAL E, SHLENS J, MAZZOCCHI S, et al. Youtube -bounding boxes: a large high-precision human-annotated data set for object detection in video [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 7464-7473.
22 MUELLER M, SMITH N, GHANEM B, et al. A benchmark and simulator for UAV tracking [C]// Computer Vision–ECCV. Amsterdam: Springer, 2016: 445-461.
23 LI, S, YEUNG D. Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models [C]// Proceedings of the AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 4140-4146.
24 FU C, CAO Z, LI Y, et al. Siamese anchor proposal network for high-speed aerial tracking [C]// IEEE International Conference on Robotics and Automation. Xi’an: IEEE, 2021: 510-516.
25 DANELLJAN M, BHAT G, KHAN F S, et al. ECO: efficient convolution operators for tracking [C]// IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 6931-6939.
26 WANG N, ZHOU W G, TIAN Q, et al. Multi-cue correlation filters for robust visual tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4844-4853.
27 LI F, TIAN C, ZUO W M, et al. Learning spatial-temporal regularized correlation filters for visual tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4904-4913.
28 ZHANG Z, PENG H, FU J, et al. Ocean: object-aware anchor-free tracking [C]// Computer Vision-ECCV. Glasgow: Springer, 2020: 771-787.
29 SOSNOVIK I, MOSKALEV A, SMEULDERS A. Scale equivariance improves siamese tracking [C]// IEEE Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 2764-2773.
30 WANG N, ZHOU W, WANG J, et al. Transformer meets tracker: exploiting temporal context for robust visual tracking [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN: IEEE, 2021: 1571-1580.
31 CHEN X, KANG B, WANG D, et al. Efficient visual tracking via hierarchical cross-attention transformer [EB/OL]. [2022-10-29]. https://arxiv.org/abs/ 2203.13537.
[1] 李晓艳,王鹏,郭嘉,李雪,孙梦宇. 基于双注意力机制的多分支孪生网络目标跟踪[J]. 浙江大学学报(工学版), 2023, 57(7): 1307-1316.
[2] 韩俊,袁小平,王准,陈烨. 基于YOLOv5s的无人机密集小目标检测算法[J]. 浙江大学学报(工学版), 2023, 57(6): 1224-1233.
[3] 王海军,张圣燕,杜玉杰. 响应和滤波器偏差感知约束的无人机目标跟踪算法[J]. 浙江大学学报(工学版), 2022, 56(9): 1824-1832.
[4] 华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[5] 陈宏鑫,张北,王春香,杨明. 基于自适应随动机构的机器人目标跟随[J]. 浙江大学学报(工学版), 2022, 56(6): 1071-1078.
[6] 张迅,李建胜,欧阳文,陈润泽,汲振,郑凯. 融合运动信息和跟踪评价的高效卷积算子[J]. 浙江大学学报(工学版), 2022, 56(6): 1135-1143, 1167.
[7] 陈小波,陈玲,梁书荣,胡煜. 重尾非高斯定位噪声下鲁棒协同目标跟踪[J]. 浙江大学学报(工学版), 2022, 56(5): 967-976.
[8] 王思鹏,杜昌平,宋广华,郑耀. 基于改进MSCKF的无人机室内定位方法[J]. 浙江大学学报(工学版), 2022, 56(4): 711-717.
[9] 卞艳,宫雨生,马国鹏,王昶. 基于无人机遥感影像的水体提取方法[J]. 浙江大学学报(工学版), 2022, 56(4): 764-774.
[10] 褚晶辉,史李栋,井佩光,吕卫. 适用于目标检测的上下文感知知识蒸馏网络[J]. 浙江大学学报(工学版), 2022, 56(3): 503-509.
[11] 国强,吴天昊,徐伟,KALIUZHNYMykola. 基于通道可靠性和异常抑制的目标跟踪算法[J]. 浙江大学学报(工学版), 2022, 56(12): 2379-2391.
[12] 杨栋杰,高贤君,冉树浩,张广斌,王萍,杨元维. 基于多重多尺度融合注意力网络的建筑物提取[J]. 浙江大学学报(工学版), 2022, 56(10): 1924-1934.
[13] 宋晓晨,姚骁帆,叶尚军. 基于伪谱法的小型超音速无人机轨迹优化[J]. 浙江大学学报(工学版), 2022, 56(1): 193-201.
[14] 金立生,华强,郭柏苍,谢宪毅,闫福刚,武波涛. 基于优化DeepSort的前方车辆多目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1056-1064.
[15] 周金海,周世镒,常阳,吴耿俊,王依川. 基于超宽带雷达基带信号的多人目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1208-1214.