Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (7): 1464-1472    DOI: 10.3785/j.issn.1008-973X.2022.07.022
航空航天技术     
视觉感知的无人机端到端目标跟踪控制技术
华夏(),王新晴*(),芮挺,邵发明,王东
中国人民解放军陆军工程大学 野战工程学院,江苏 南京 210007
Vision-driven end-to-end maneuvering object tracking of UAV
Xia HUA(),Xin-qing WANG*(),Ting RUI,Fa-ming SHAO,Dong WANG
College of Field Engineering, Army Engineering University of PLA, Nanjing 210007, China
 全文: PDF(2109 KB)   HTML
摘要:

针对无人机机动目标跟踪的自主运动控制问题,提出连续型动作输出的无人机端到端主动目标跟踪控制方法. 设计基于视觉感知和深度强化学习策略的端到端决策控制模型,将无人机观察的连续帧视觉图像作为输入状态,输出无人机飞行动作的连续型控制量. 为了提高控制模型的泛化能力,改进基于任务分解和预训练的高效迁移学习策略. 仿真结果表明,该方法能够在多种机动目标跟踪任务中实现无人机姿态的自适应调整,使得无人机在空中能够稳定跟踪移动目标,显著提高了无人机跟踪控制器在未知环境下的泛化能力和训练效率.

关键词: 深度强化学习机器视觉自主无人机迁移学习目标跟踪    
Abstract:

An end-to-end active object tracking control method of UAV with continuous motion output was proposed aiming at the autonomous motion control problem of UAV maneuvering object tracking. An end-to-end decision-making control model based on visual perception and deep reinforcement learning strategy was designed. The continuous visual images observed by UAV were taken as the input state, and the continuous control quantity of UAV flight action was output. An efficient transfer learning strategy based on task decomposition and pre training was proposed in order to improve the generalization ability of control model. The simulation results show that the method can realize the adaptive adjustment of UAV attitude in a variety of maneuvering object tracking tasks and make the UAV stably track the moving object in the air. The generalization ability and training efficiency of UAV tracking controller in unknown environment were significantly improved.

Key words: deep reinforcement learning    machine vision    autonomous UAV    transfer learning    object tracking
收稿日期: 2021-06-22 出版日期: 2022-07-26
CLC:  TP 242  
基金资助: 国家重点研发计划资助项目(2016YFC0802904);国家自然科学基金资助项目(61671470);江苏省自然科学基金资助项目(BK20161470);中国博士后科学基金第62批面上资助项目(2017M623423)
通讯作者: 王新晴     E-mail: huaxia120888@163.com;17626039818@163.com
作者简介: 华夏(1995—),男,博士生,从事计算机图形学、机器视觉、数字图像处理及人工智能的研究. orcid.org/0000-0002-0953-3044. E-mail: huaxia120888@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
华夏
王新晴
芮挺
邵发明
王东

引用本文:

华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.

Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.07.022        https://www.zjujournals.com/eng/CN/Y2022/V56/I7/1464

图 1  决策系统模型框图
图 2  决策系统模型框图
图 3  U-Net分割网络的结构框图
图 4  Actor网络和Critic网络的结构框图
图 5  目标跟踪的二维局部坐标系
图 6  基于任务分解和双重训练的跟踪任务迁移学习
图 7  Airsim无人机虚拟测试场景
阶段 ${C_{\rm{a}}}$ $ {N_{\rm{b}}} $ $ {l_1} $ $ {l_2} $ $ {e_{\max }} $ $ s_{\rm{i}} $ $ R_{\rm{u}} $ $ T_{\rm{b}} $
预训练 10000 64 0.001 0.001 1000 500 0.03 20
训练 2000 32 0.0001 0.0002 1000 500 0.05 20
表 1  深度强化学习的实验参数设置
图 8  虚拟测试场景中对车辆和人员的跟踪效果
环境 tp/ms
低速直线运动车辆跟踪 66.3
低速直线运动人员跟踪 63.1
复杂曲线高速运动车辆跟踪 68.5
复杂曲线人员跟踪 63.6
表 2  单帧图像模型的处理效率
图 9  设计的策略的奖励消融实验
图 10  与其他先进模型的奖励值对比结果
环境 AR
MIL Meanshift KCF TLD 本文算法
低速直线车辆 ?432.3 ?455.6 ?407.3 ?489.2 458.2
低速直线人员 ?358.2 ?367.8 ?330.2 ?355.7 646.3
曲线高速车辆 ?595.6 ?653.6 ?638.6 ?599.7 285.9
曲线高速人员 ?495.7 ?503.7 ?525.3 ?498.7 373.4
表 3  不同跟踪任务和场景中不同模型的AR比较结果
环境 EL
MIL Meanshift KCF TLD 本文算法
低速直线车辆 64.7 55.8 40.7 69.2 168.3
低速直线人员 91.2 67.2 53.2 69.2 186.5
曲线高速车辆 33.6 31.2 38.6 69.2 165.9
曲线高速人员 35.7 33.8 37.3 69.2 172.4
表 4  不同跟踪任务和场景中不同模型的EL比较结果
1 LUO W, SUN P, ZHONG F, et al. End-to-end active object tracking and its real-world deployment via reinforcement learning [EB/OL]. [2021-05-20]. https://ieeexplore.ieee.org/document/8642452/footnotes#footnotes.
2 李轶锟. 基于视觉的四旋翼飞行器地面目标跟踪技术[D]. 南京: 南京航空航天大学, 2019.
LI Yi-kun. Ground target tracking technology of quadrotor based on vision [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2019.
3 刘亮. 四旋翼飞行器移动目标跟踪控制研究[D]. 西安: 西安电子科技大学, 2020.
LIU Liang. Research on moving target tracking control of quadrotor aircraft [D]. Xi'an: Xi'an University of Electronic Science and Technology, 2020.
4 LI B, YANG Z, CHEN D, et al Maneuvering target tracking of UAV based on MN-DDPG and transfer learning[J]. Defence Technology, 2021, 17 (2): 457- 466
5 罗伟, 徐雪松, 张煌军 多旋翼无人机目标跟踪系统设计[J]. 华东交通大学学报, 2019, 36 (3): 72- 79
LUO Wei, XU Xue-song, ZHANG Huang-jun Design of multi rotor UAV target tracking system[J]. Journal of East China Jiaotong University, 2019, 36 (3): 72- 79
6 张兴旺, 刘小雄, 林传健, 等 基于Tiny-YOLOV3的无人机地面目标跟踪算法设计[J]. 计算机测量与控制, 2021, 29 (2): 76- 81
ZHANG Xing-wang, LIU Xiao-xiong, LIN Chuan-jian, et al Design of UAV ground target tracking algorithm based on Tiny-YOLOV3[J]. Computer Measurement and Control, 2021, 29 (2): 76- 81
7 张昕, 李沛, 蔡俊伟 基于非线性导引的多无人机协同目标跟踪控制[J]. 指挥信息系统与技术, 2019, 10 (4): 47- 54
ZHANG Xin, LI Pei, CAI Jun-wei Multi UAV cooperative target tracking control based on nonlinear guidance[J]. Command Information System and Technology, 2019, 10 (4): 47- 54
8 黄志清, 曲志伟, 张吉, 等 基于深度强化学习的端到端无人驾驶决策[J]. 电子学报, 2020, 48 (9): 1711- 1719
HUANG Zhi-qing, QU Zhi-wei, ZHANG Ji, et al End-to-end autonomous driving decision based on deep reinforcement learning[J]. Acta Electronica Sinica, 2020, 48 (9): 1711- 1719
doi: 10.3969/j.issn.0372-2112.2020.09.007
9 VOLODYMYR M, KORAY K, DAVID S, et al Human-level control through deep reinforcement learning[J]. Nature, 2019, 518 (7540): 529
10 LECUN Y, BENGIO Y, HINTON G Deep learning[J]. Nature, 2015, 521 (7553): 436
doi: 10.1038/nature14539
11 MATTHEW B, SAM R, et al Reinforcement learning, fast and slow[J]. Trends in Cognitive Sciences, 2019, 23 (5): 408- 422
12 LIU Q, SHI L, SUN L, et al Path planning for UAV-mounted mobile edge computing with deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2020, 69 (5): 5723- 5728
doi: 10.1109/TVT.2020.2982508
13 KHAN A, FENG Jiang, LIU Shao-hui, et al Playing a FPS Doom video game with deep visual reinforcement learning[J]. Automatic Control and Computer Sciences, 2019, 53 (3): 214- 222
doi: 10.3103/S0146411619030052
14 SEWAK M. Deep Q network (DQN), double DQN, and dueling DQN: a step towards general artificial intelligence [M]. Singapore: Springer, 2019.
15 ZENG Y, XU X, JIN S, et al Simultaneous navigation and radio mapping for cellular-connected UAV with deep reinforcement learning[J]. IEEE Transactions on Wireless Communications, 2021, 20 (7): 4205- 4220
doi: 10.1109/TWC.2021.3056573
16 LUO C, JIN L, SUN Z MORAN: a multi-object rectified attention network for scene text recognition[J]. Pattern Recognition, 2019, 90: 109- 118
doi: 10.1016/j.patcog.2019.01.020
17 DE BLASI S, KLÖSER S, MÜLLER A, et al KIcker: an industrial drive and control Foosball system automated with deep reinforcement learning[J]. Journal of Intelligent and Robotic Systems, 2021, 102 (1): 107
18 HE G, LIU T, WANG Y, et al. Research on Actor-Critic reinforcement learning in RoboCup [C]// World Congress on Intelligent Control and Automation. Dalian: IEEE, 2006: 205.
19 WAN K F, GAO X G, HU Z J, et al. Robust motion control for UAV in dynamic uncertain environments using deep reinforcement learning [EB/OL]. [2021-05-20]. https://www.mdpi.com/2072-4292/12/4/640.
20 YANG Q, ZHU Y, ZHANG J, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm [C]// 2019 IEEE 15th International Conference on Control and Automation. Edinburgh: IEEE, 2019.
21 WAN K, GAO X, HU Z, et al Robust motion control for UAV in dynamic uncertain environments using deep reinforcement learning[J]. Remote Sensing, 2020, 12 (4): 640
doi: 10.3390/rs12040640
22 SHIN S, KANG Y, KIM Y Reward-driven U-net training for obstacle avoidance drone[J]. Expert Systems with Applications, 2019, 143: 113064
23 POLVARA R, PATACCHIOLA M, HANHEIDE M, et al. Sim-to-real quadrotor landing via sequential deep Q-networks and domain randomization [EB/OL]. [2021-05-20]. https://www.mdpi.com/2218-6581/9/1/8.
24 SHAH S, KAPOOR A, DEY D, et al AirSim: high-fidelity visual and physical simulation for autonomous vehicles[J]. Field and Service Robotics, 2017, 11 (1): 621- 635
25 林传健, 章卫国, 史静平, 等 无人机跟踪系统仿真平台的设计与实现[J]. 哈尔滨工业大学学报, 2020, 52 (10): 119- 127
LIN Chuan-jian, ZHANG Wei-guo, SHI Jing-ping, et al Design and implementation of UAV tracking system simulation platform[J]. Journal of Harbin Institute of Technology, 2020, 52 (10): 119- 127
26 LIU H, WU Y, SUN F Extreme trust region policy optimization for active object recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29 (6): 2253- 2258
doi: 10.1109/TNNLS.2017.2785233
27 WANG Z, LI H, WU Z, et al A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space[J]. International Journal of Advanced Robotic Systems, 2021, 18 (1): 1- 9
28 WU Y, MANSIMOV E, GROSSE R B, et al Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation[J]. Advances in Neural Information Processing Systems, 2017, 30 (1): 5279- 5288
29 BABENKO B, YANG M H, BELONGIE S. Visual tracking with online multiple instance learning [C]// IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009.
30 D COMANICIU, RAMESH V, MEER P. Real-time tracking of non-rigid objects using mean shift [C]// IEEE Conference on Computer Vision and Pattern Recognition. Nice: IEEE, 2003.
31 HENRIQUES J F, CASEIRO R, MARTINS P, et al High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37 (3): 583- 596
doi: 10.1109/TPAMI.2014.2345390
[1] 陈宏鑫,张北,王春香,杨明. 基于自适应随动机构的机器人目标跟随[J]. 浙江大学学报(工学版), 2022, 56(6): 1071-1078.
[2] 张迅,李建胜,欧阳文,陈润泽,汲振,郑凯. 融合运动信息和跟踪评价的高效卷积算子[J]. 浙江大学学报(工学版), 2022, 56(6): 1135-1143, 1167.
[3] 刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[4] 高一聪,王彦坤,费少梅,林琼. 基于迁移学习的机械制图智能评阅方法[J]. 浙江大学学报(工学版), 2022, 56(5): 856-863, 889.
[5] 陈小波,陈玲,梁书荣,胡煜. 重尾非高斯定位噪声下鲁棒协同目标跟踪[J]. 浙江大学学报(工学版), 2022, 56(5): 967-976.
[6] 陈智超,焦海宁,杨杰,曾华福. 基于改进MobileNet v2的垃圾图像分类算法[J]. 浙江大学学报(工学版), 2021, 55(8): 1490-1499.
[7] 金立生,华强,郭柏苍,谢宪毅,闫福刚,武波涛. 基于优化DeepSort的前方车辆多目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1056-1064.
[8] 周金海,周世镒,常阳,吴耿俊,王依川. 基于超宽带雷达基带信号的多人目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1208-1214.
[9] 程训,余建波. 基于机器视觉的加工刀具磨损监测方法[J]. 浙江大学学报(工学版), 2021, 55(5): 896-904.
[10] 宋鹏,杨德东,李畅,郭畅. 整体特征通道识别的自适应孪生网络跟踪算法[J]. 浙江大学学报(工学版), 2021, 55(5): 966-975.
[11] 马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.
[12] 杜军,马琛,魏正英. 基于视觉传感的铝合金电弧增材沉积层形貌动态响应[J]. 浙江大学学报(工学版), 2020, 54(8): 1481-1489.
[13] 康庄,杨杰,郭濠奇. 基于机器视觉的垃圾自动分类系统设计[J]. 浙江大学学报(工学版), 2020, 54(7): 1272-1280.
[14] 沈宗礼,余建波. 基于迁移学习与深度森林的晶圆图缺陷识别[J]. 浙江大学学报(工学版), 2020, 54(6): 1228-1239.
[15] 李瑛,成芳,赵志林. 采用结构光的大跨度销孔加工精度在线测量[J]. 浙江大学学报(工学版), 2020, 54(3): 557-565.