Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (7): 1492-1503    DOI: 10.3785/j.issn.1008-973X.2025.07.017
机械与能源工程     
基于改进深度强化学习算法的农业机器人路径规划
赵威1,2(),张万枝1,2,*(),侯加林1,2,侯瑞3,李玉华1,4,赵乐俊1,4,程进1,2
1. 山东农业大学 机械与电子工程学院,山东 泰安 271018
2. 农业装备智能化山东省工程研究中心,山东 泰安 271018
3. 北京邮电大学 人工智能学院,北京 100876
4. 山东省设施园艺智慧生产技术装备重点实验室(筹),山东 泰安 271018
Path planning of agricultural robots based on improved deep reinforcement learning algorithm
Wei ZHAO1,2(),Wanzhi ZHANG1,2,*(),Jialin HOU1,2,Rui HOU3,Yuhua LI1,4,Lejun ZHAO1,4,Jin Cheng1,2
1. College of Mechanical and Electronic Engineering, Shandong Agricultural University, Taian 271018, China
2. Shandong Engineering Research Center of Agricultural Equipment Intelligentization, Taian 271018, China
3. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
4. Shandong Key Laboratory of Intelligent Production Technology and Equipment for Facility Horticulture, Taian 271018, China
 全文: PDF(2200 KB)   HTML
摘要:

农业机器人采用深度强化学习算法进行路径规划时存在难以找到目标点、稀疏奖励、收敛缓慢等问题,为此提出基于多目标点导航融合改进深度Q网络算法(MPN-DQN)的路径规划方法. 利用激光同步定位与建图(SLAM)扫描全局环境以构建先验地图,划分行走行和作物行区域;对地图边界进行膨胀拟合处理,形成前向弓字形作业走廊. 利用中间目标点分割全局环境,将复杂环境划分为多阶段短程导航环境以简化目标点搜索过程. 从动作空间、探索策略和奖励函数3个方面改进深度Q网络算法以改善奖励稀疏问题,加快算法收敛速度,提高导航成功率. 实验结果表明,搭载MPN-DQN的农业机器人自主行驶的总碰撞次数为1,平均导航时间为104.27 s,平均导航路程为16.58 m,平均导航成功率为95%.

关键词: 深度强化学习农业机器人中间目标点多目标点导航融合改进深度Q网络算法(MPN-DQN)路径规划    
Abstract:

In order to solve the problems of difficulty in finding target points, sparse rewards, and slow convergence when using deep reinforcement learning algorithms for path planning of agricultural robots, a path-planning method based on multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN) was proposed. The laser simultaneous localization and mapping (SLAM) was used to scan the global environment to construct a prior map and divide the walking row and crop row areas, and the map boundary was expanded and fitted to form a forward bow-shaped operation corridor. The middle target point was used to segment the global environment, and the complex environment was divided into a multi-stage short-range navigation environment to simplify the target point search process. The deep Q-network algorithm was improved from three aspects: action space, exploration strategy and reward function to improve the reward sparsity problem, accelerate the convergence speed of the algorithm, and improve the navigation success rate. Experimental results showed that the total number of collisions of agricultural robots equipped with the MPN-DQN algorithm was 1, the average navigation time was 104.27 s, the average navigation distance was 16.58 m, and the average navigation success rate was 95%.

Key words: deep reinforcement learning    agricultural robot    intermediate target point    multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN)    path planning
收稿日期: 2024-09-04 出版日期: 2025-07-25
CLC:  TP 242  
基金资助: 山东省重点研发计划(重大科技创新工程)项目(2022CXGC020703);山东省薯类产业技术体系农业机械岗位专家项目(SDAIT-16-10);山东省重点研发计划(乡村振兴科技创新提振行动计划)项目(2022TZXD006).
通讯作者: 张万枝     E-mail: zhao868250709@163.com;zhangwanzhi@163.com
作者简介: 赵威(1988—),男,硕士生,从事农机导航控制技术研究. orcid.org/0009-0005-2286-8569. E-mail:zhao868250709@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
赵威
张万枝
侯加林
侯瑞
李玉华
赵乐俊
程进

引用本文:

赵威,张万枝,侯加林,侯瑞,李玉华,赵乐俊,程进. 基于改进深度强化学习算法的农业机器人路径规划[J]. 浙江大学学报(工学版), 2025, 59(7): 1492-1503.

Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng. Path planning of agricultural robots based on improved deep reinforcement learning algorithm. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1492-1503.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.07.017        https://www.zjujournals.com/eng/CN/Y2025/V59/I7/1492

图 1  所提路径规划方法的整体流程图
图 2  中间目标点分割全局环境原理图
图 3  动作空间规则图(3邻域)
ai速度方向ω/(rad·s?1
0车头正前方左偏45°0.785
1车头正前方0
2车头正前方右偏45°?0.785
表 1  角速度与动作方向对应关系
图 4  作业场景目标点区域划分图
图 5  两轮差速机器人运动学模型
图 6  所提路径规划方法的网络结构
图 7  所提路径规划方法的训练流程图
图 8  仿真实验模拟环境图
参数数值参数数值
中线奖励因子a1.25经验池容量Rn128 000
区域奖励因子b±1.2学习率lr0.001
贪婪权重α0.6训练回合数Ep2 000
优先经验回放权重β0.4每回合时间步数Es1 000
折扣率γ0.9网络更新频率ui10
抽样保证因子τ0.3每批次样本数bs128
衰减步数εd1 000
表 2  仿真实验中的主要超参数
方法N∈[1, 900]N∈[901, 1600]N∈[1601, 2000]
Ctntna/spns/%Ctntna/spns/%Ctntna/spns/%
传统DQN66297.3526.4439791.5743.2913889.2665.50
DDQN61398.6431.8930889.3856.0011483.5971.50
Dueling DQN59699.3733.7830388.4556.7110382.3174.25
本研究40996.7254.569783.2586.14076.77100.00
表 3  不同路径规划方法在训练环境中的性能比较结果
图 9  不同路径规划方法在训练环境中的平均奖励值
方法Ctntna/sdna/mpns/%
传统DQN134106.5717.9973.2
DDQN103104.2517.1279.4
Dueling DQN97103.1616.9080.6
本研究1192.3515.5697.8
表 4  不同路径规划方法在测试环境中的性能比较结果
图 10  不同路径规划方法在测试环境中的运动轨迹图
图 11  差速机器人
图 12  模拟实验场景及先验地图
图 13  不同路径规划方法在模拟场景中的轨迹图
对比算法Ctntna/sdna/mpns/%
传统DQN14127.2619.2165.0
DDQN10115.1717.5575.0
Dueling DQN9112.9116.5977.5
本研究196.9315.8797.5
表 5  不同路径规划方法在模拟场景中的性能比较结果
图 14  实际作业环境及导航轨迹图
29 HUANG Yansong, YAO Xifan, JING Xuan, et al DQN-based AGV path planning for situations with multi-starts and multi-targets[J]. Computer Integrated Manufacturing Systems, 2023, 29 (8): 2550- 2562
30 XING B, WANG X, LIU Z The wide-area coverage path planning strategy for deep-sea mining vehicle cluster based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2024, 12 (2): 316
doi: 10.3390/jmse12020316
31 王童, 李骜, 宋海荦, 等 基于分层深度强化学习的移动机器人导航方法[J]. 控制与决策, 2022, 37 (11): 2799- 2807
WANG Tong, LI Ao, SONG Hailuo, et al Navigation method for mobile robot based on hierarchical deep reinforcement learning[J]. Control and Decision, 2022, 37 (11): 2799- 2807
32 徐杨, 熊举举, 李论, 等 采用改进的YOLOv5s检测花椒簇[J]. 农业工程学报, 2023, 39 (16): 283- 290
XU Yang, XIONG Juju, LI Lun, et al Detecting pepper cluster using improved YOLOv5s[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023, 39 (16): 283- 290
doi: 10.11975/j.issn.1002-6819.202306119
33 刘慧, 卢云志, 张雷 基于Dropout改进的SRGAN网络DrSRGAN[J]. 科学技术与工程, 2023, 23 (23): 10015- 10022
LIU Hui, LU Yunzhi, ZHANG Lei Improved SRGAN network based on Dropout called DrSRGAN[J]. Science Technology and Engineering, 2023, 23 (23): 10015- 10022
doi: 10.12404/j.issn.1671-1815.2023.23.23.10015
1 刘宇庭, 郭世杰, 唐术锋, 等 改进A*与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报: 工学版, 2024, 58 (2): 360- 369
LIU Yuting, GUO Shijie, TANG Shufeng, et al Path planning based on fusion of improved A* and ROA-DWA for robot[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (2): 360- 369
2 章一鸣, 姚文广, 陈海进 动态环境下自主机器人的双机制切向避障[J]. 浙江大学学报: 工学版, 2024, 58 (4): 779- 789
ZHANG Yiming, YAO Wenguang, CHEN Haijin Dual-mechanism tangential obstacle avoidance of autonomous robots in dynamic environment[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (4): 779- 789
3 侯文慧, 周传起, 程炎, 等 基于轻量化U-Net网络的果园垄间路径识别方法[J]. 农业机械学报, 2024, 55 (2): 16- 27
HOU Wenhui, ZHOU Chuanqi, CHENG Yan, et al Path recognition method of orchard ridges based on lightweight U-Net[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (2): 16- 27
doi: 10.6041/j.issn.1000-1298.2024.02.002
4 张万枝, 赵威, 李玉华, 等 基于改进A*算法+LM-BZS算法的农业机器人路径规划[J]. 农业机械学报, 2024, 55 (8): 81- 92
ZHANG Wanzhi, ZHAO Wei, LI Yuhua, et al Path planning of agricultural robot based on improved A* and LM-BZS algorithms[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (8): 81- 92
doi: 10.6041/j.issn.1000-1298.2024.08.007
5 张万枝, 白文静, 吕钊钦, 等 线性时变模型预测控制器提高农业车辆导航路径自动跟踪精度[J]. 农业工程学报, 2017, 33 (13): 104- 111
ZHANG Wanzhi, BAI Wenjing, LÜ Zhaoqin, et al Linear time-varying model predictive controller improving precision of navigation path automatic tracking for agricultural vehicle[J]. Transactions of the Chinese Society of Agricultural Engineering, 2017, 33 (13): 104- 111
doi: 10.11975/j.issn.1002-6819.2017.13.014
6 刘正铎, 张万枝, 吕钊钦, 等 基于非线性模型的农用车路径跟踪控制器设计与试验[J]. 农业机械学报, 2018, 49 (7): 23- 30
LIU Zhengduo, ZHANG Wanzhi, LÜ Zhaoqin, et al Design and test of path tracking controller based on nonlinear model prediction[J]. Transactions of the Chinese Society for Agricultural Machinery, 2018, 49 (7): 23- 30
doi: 10.6041/j.issn.1000-1298.2018.07.003
7 刘天湖, 张迪, 郑琰, 等 基于改进RRT*算法的菠萝采收机导航路径规划[J]. 农业工程学报, 2022, 38 (23): 20- 28
LIU Tianhu, ZHANG Di, ZHENG Yan, et al Navigation path planning of the pineapple harvester based on improved RRT* algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (23): 20- 28
doi: 10.11975/j.issn.1002-6819.2022.23.003
8 劳彩莲, 李鹏, 冯宇 基于改进A*与DWA算法融合的温室机器人路径规划[J]. 农业机械学报, 2021, 52 (1): 14- 22
LAO Cailian, LI Peng, FENG Yu Path planning of greenhouse robot based on fusion of improved A* algorithm and dynamic window approach[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52 (1): 14- 22
doi: 10.6041/j.issn.1000-1298.2021.01.002
9 景云鹏, 金志坤, 刘刚 基于改进蚁群算法的农田平地导航三维路径规划方法[J]. 农业机械学报, 2020, 51 (Suppl.1): 333- 339
JING Yunpeng, JIN Zhikun, LIU Gang Three dimensional path planning method for navigation of farmland leveling based on improved ant colony algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (Suppl.1): 333- 339
10 高兴旺, 任力生, 王芳 番茄温室内移动喷药机器人的路径规划研究[J]. 计算机工程与应用, 2024, 60 (16): 325- 332
GAO Xingwang, REN Lisheng, WANG Fang Path planning study of mobile spraying robot in tomato greenhouse[J]. Computer Engineering and Applications, 2024, 60 (16): 325- 332
doi: 10.3778/j.issn.1002-8331.2306-0002
11 崔永杰, 王寅初, 何智, 等 基于改进RRT算法的猕猴桃采摘机器人全局路径规划[J]. 农业机械学报, 2022, 53 (6): 151- 158
CUI Yongjie, WANG Yinchu, HE Zhi, et al Global path planning of kiwifruit harvesting robot based on improved RRT algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (6): 151- 158
doi: 10.6041/j.issn.1000-1298.2022.06.015
12 陈凯, 解印山, 李彦明, 等 多约束情形下的农机全覆盖路径规划方法[J]. 农业机械学报, 2022, 53 (5): 17- 26
CHEN Kai, XIE Yinshan, LI Yanming, et al Full coverage path planning method of agricultural machinery under multiple constraints[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (5): 17- 26
doi: 10.6041/j.issn.1000-1298.2022.05.002
13 谢秋菊, 王圣超, MUSABIMANA J, 等 基于深度强化学习的猪舍环境控制策略优化与能耗分析[J]. 农业机械学报, 2023, 54 (11): 376- 384
XIE Qiuju, WANG Shengchao, MUSABIMANA J, et al Pig building environment optimization control and energy consumption analysis based on deep reinforcement learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (11): 376- 384
doi: 10.6041/j.issn.1000-1298.2023.11.036
14 熊俊涛, 李中行, 陈淑绵, 等 基于深度强化学习的虚拟机器人采摘路径避障规划[J]. 农业机械学报, 2020, 51 (Suppl.2): 1- 10
XIONG Juntao, LI Zhonghang, CHEN Shumian, et al Obstacle avoidance planning of virtual robot picking path based on deep reinforcement learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (Suppl.2): 1- 10
doi: 10.6041/j.issn.1000-1298.2020.S2.001
15 IYENGAR K, SPURGEON S, STOYANOV D Deep reinforcement learning for concentric tube robot path following[J]. IEEE Transactions on Medical Robotics and Bionics, 2024, 6 (1): 18- 29
doi: 10.1109/TMRB.2023.3310037
16 赵淼, 谢良, 林文静, 等 基于动态选择预测器的深度强化学习投资组合模型[J]. 计算机科学, 2024, 51 (4): 344- 352
ZHAO Miao, XIE Liang, LIN Wenjing, et al Deep reinforcement learning portfolio model based on dynamic selectors[J]. Computer Science, 2024, 51 (4): 344- 352
doi: 10.11896/jsjkx.230100048
17 GAO A, LU S, XU R, et al Deep reinforcement learning based planning method in state space for lunar rovers[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107287
doi: 10.1016/j.engappai.2023.107287
18 刘飞, 唐方慧, 刘琳婷, 等 基于Dueling DQN算法的列车运行图节能优化研究[J]. 都市快轨交通, 2024, 37 (2): 39- 46
LIU Fei, TANG Fanghui, LIU Linting, et al Energy saving optimization of train operation timetable based on a Dueling DQN algorithm[J]. Urban Rapid Rail Transit, 2024, 37 (2): 39- 46
doi: 10.3969/j.issn.1672-6073.2024.02.006
19 李航, 廖映华, 黄波 基于改进DQN算法的茶叶采摘机械手路径规划[J]. 中国农机化学报, 2023, 44 (8): 198- 205
LI Hang, LIAO Yinghua, HUANG Bo Research on path planning of tea picking manipulator based on improved DQN[J]. Journal of Chinese Agricultural Mechanization, 2023, 44 (8): 198- 205
20 林俊强, 王红军, 邹湘军, 等 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35 (8): 1692- 1704
LIN Junqiang, WANG Hongjun, ZOU Xiangjun, et al Obstacle avoidance path planning and simulation of mobile picking robot based on DPPO[J]. Journal of System Simulation, 2023, 35 (8): 1692- 1704
21 熊春源, 熊俊涛, 杨振刚, 等 基于深度强化学习的柑橘采摘机械臂路径规划方法[J]. 华南农业大学学报, 2023, 44 (3): 473- 483
XIONG Chunyuan, XIONG Juntao, YANG Zhengang, et al Path planning method for citrus picking manipulator based on deep reinforcement learning[J]. Journal of South China Agricultural University, 2023, 44 (3): 473- 483
doi: 10.7671/j.issn.1001-411X.202206024
22 WANG Y, LU C, WU P, et al Path planning for unmanned surface vehicle based on improved Q-Learning algorithm[J]. Ocean Engineering, 2024, 292: 116510
doi: 10.1016/j.oceaneng.2023.116510
23 ZHOU Q, LIAN Y, WU J, et al An optimized Q-Learning algorithm for mobile robot local path planning[J]. Knowledge-Based Systems, 2024, 286: 111400
doi: 10.1016/j.knosys.2024.111400
24 史殿习, 彭滢璇, 杨焕焕, 等 基于DQN的多智能体深度强化学习运动规划方法[J]. 计算机科学, 2024, 51 (2): 268- 277
SHI Dianxi, PENG Yingxuan, YANG Huanhuan, et al DQN-based multi-agent motion planning method with deep reinforcement learning[J]. Computer Science, 2024, 51 (2): 268- 277
doi: 10.11896/jsjkx.230500113
25 MIRANDA V R F, NETO A A, FREITAS G M, et al Generalization in deep reinforcement learning for robotic navigation by reward shaping[J]. IEEE Transactions on Industrial Electronics, 2024, 71 (6): 6013- 6020
doi: 10.1109/TIE.2023.3290244
26 王鑫, 仲伟志, 王俊智, 等 基于深度强化学习的无人机路径规划与无线电测绘[J]. 应用科学学报, 2024, 42 (2): 200- 210
WANG Xin, ZHONG Weizhi, WANG Junzhi, et al UAV path planning and radio mapping based on deep reinforcement learning[J]. Journal of Applied Sciences, 2024, 42 (2): 200- 210
doi: 10.3969/j.issn.0255-8297.2024.02.002
27 SAGA R, KOZONO R, TSURUMI Y, et al Deep-reinforcement learning-based route planning with obstacle avoidance for autonomous vessels[J]. Artificial Life and Robotics, 2024, 29 (1): 136- 144
doi: 10.1007/s10015-023-00909-4
28 胡洁, 张亚莉, 王团, 等 基于深度强化学习的农田节点数据无人机采集方法[J]. 农业工程学报, 2022, 38 (22): 41- 51
HU Jie, ZHANG Yali, WANG Tuan, et al UAV collection methods for the farmland nodes data based on deep reinforcement learning[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (22): 41- 51
doi: 10.11975/j.issn.1002-6819.2022.22.005
[1] 叶俊,肖志斌,林晓阳,全冠,王震,王跃达,何江飞,赵阳. 基于多轴3D打印的三维自支撑桁架结构优化方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1333-1343.
[2] 郝琨,孟璇,赵晓芳,李志圣. 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1451-1461.
[3] 廖榆信,王伟,滕卫明,贺海晏,王战,王进. 基于多目标约束的无人机光顺路径生成全局优化方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1481-1491.
[4] 于少猛,闫铭,王鹏飞,朱建锡,杨欣. 丘陵山地果园植保无人机三维路径规划[J]. 浙江大学学报(工学版), 2025, 59(3): 635-642.
[5] 张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.
[6] 叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[7] 刘宇庭,郭世杰,唐术锋,张学炜,李田田. 改进A*与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报(工学版), 2024, 58(2): 360-369.
[8] 陈丽芳,杨火根,陈智超,杨杰. B样条技术与遗传算法融合的全局路径规划[J]. 浙江大学学报(工学版), 2024, 58(12): 2520-2530.
[9] 刘慧,王秀丽,沈跃,徐婕. 基于三维激光点云的苗圃场景多目标分类方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2430-2438.
[10] 张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[11] 姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[12] 靳佳澳,沈洪垚,孙扬帆,林嘉浩,陈静霓. 面向电弧增材的单线激光扫描路径规划[J]. 浙江大学学报(工学版), 2023, 57(1): 21-31.
[13] 徐维祥,康楠,徐婷. 基于出行计划数据的最优路径规划方法[J]. 浙江大学学报(工学版), 2022, 56(8): 1542-1552.
[14] 华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[15] 刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.