基于改进深度强化学习算法的农业机器人路径规划

doi:10.3785/j.issn.1008-973X.2025.07.017

浙江大学学报(工学版)

2025, Vol. 59

Issue (7): 1492-1503 DOI: 10.3785/j.issn.1008-973X.2025.07.017

机械与能源工程

基于改进深度强化学习算法的农业机器人路径规划

赵威1,2(

),张万枝1,2,*(

),侯加林1,2,侯瑞3,李玉华1,4,赵乐俊1,4,程进1,2

1. 山东农业大学机械与电子工程学院，山东泰安 271018
2. 农业装备智能化山东省工程研究中心，山东泰安 271018
3. 北京邮电大学人工智能学院，北京 100876
4. 山东省设施园艺智慧生产技术装备重点实验室（筹），山东泰安 271018

Path planning of agricultural robots based on improved deep reinforcement learning algorithm

Wei ZHAO1,2(

),Wanzhi ZHANG1,2,*(

),Jialin HOU1,2,Rui HOU3,Yuhua LI1,4,Lejun ZHAO1,4,Jin Cheng1,2

1. College of Mechanical and Electronic Engineering, Shandong Agricultural University, Taian 271018, China
2. Shandong Engineering Research Center of Agricultural Equipment Intelligentization, Taian 271018, China
3. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
4. Shandong Key Laboratory of Intelligent Production Technology and Equipment for Facility Horticulture, Taian 271018, China

全文: PDF(2200 KB) HTML

摘要：

农业机器人采用深度强化学习算法进行路径规划时存在难以找到目标点、稀疏奖励、收敛缓慢等问题，为此提出基于多目标点导航融合改进深度Q网络算法（MPN-DQN）的路径规划方法. 利用激光同步定位与建图（SLAM）扫描全局环境以构建先验地图，划分行走行和作物行区域；对地图边界进行膨胀拟合处理，形成前向弓字形作业走廊. 利用中间目标点分割全局环境，将复杂环境划分为多阶段短程导航环境以简化目标点搜索过程. 从动作空间、探索策略和奖励函数3个方面改进深度Q网络算法以改善奖励稀疏问题，加快算法收敛速度，提高导航成功率. 实验结果表明，搭载MPN-DQN的农业机器人自主行驶的总碰撞次数为1，平均导航时间为104.27 s，平均导航路程为16.58 m，平均导航成功率为95%.

关键词： 深度强化学习; 农业机器人; 中间目标点; 多目标点导航融合改进深度Q网络算法(MPN-DQN); 路径规划

Abstract:

In order to solve the problems of difficulty in finding target points, sparse rewards, and slow convergence when using deep reinforcement learning algorithms for path planning of agricultural robots, a path-planning method based on multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN) was proposed. The laser simultaneous localization and mapping (SLAM) was used to scan the global environment to construct a prior map and divide the walking row and crop row areas, and the map boundary was expanded and fitted to form a forward bow-shaped operation corridor. The middle target point was used to segment the global environment, and the complex environment was divided into a multi-stage short-range navigation environment to simplify the target point search process. The deep Q-network algorithm was improved from three aspects: action space, exploration strategy and reward function to improve the reward sparsity problem, accelerate the convergence speed of the algorithm, and improve the navigation success rate. Experimental results showed that the total number of collisions of agricultural robots equipped with the MPN-DQN algorithm was 1, the average navigation time was 104.27 s, the average navigation distance was 16.58 m, and the average navigation success rate was 95%.

Key words: deep reinforcement learning agricultural robot intermediate target point multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN) path planning

收稿日期: 2024-09-04 出版日期: 2025-07-25

CLC:

TP 242

基金资助: 山东省重点研发计划（重大科技创新工程）项目（2022CXGC020703）；山东省薯类产业技术体系农业机械岗位专家项目（SDAIT-16-10）；山东省重点研发计划（乡村振兴科技创新提振行动计划）项目（2022TZXD006）.

通讯作者: 张万枝 E-mail: zhao868250709@163.com;zhangwanzhi@163.com

作者简介: 赵威（1988—），男，硕士生，从事农机导航控制技术研究. orcid.org/0009-0005-2286-8569. E-mail：zhao868250709@163.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	赵威
	张万枝
	侯加林
	侯瑞
	李玉华
	赵乐俊
	程进

引用本文:

赵威,张万枝,侯加林,侯瑞,李玉华,赵乐俊,程进. 基于改进深度强化学习算法的农业机器人路径规划[J]. 浙江大学学报(工学版), 2025, 59(7): 1492-1503.

Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng. Path planning of agricultural robots based on improved deep reinforcement learning algorithm. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1492-1503.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.07.017 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I7/1492

图 1 所提路径规划方法的整体流程图

图 2 中间目标点分割全局环境原理图

图 3 动作空间规则图（3邻域）

表 1 角速度与动作方向对应关系

图 4 作业场景目标点区域划分图

图 5 两轮差速机器人运动学模型

图 6 所提路径规划方法的网络结构

图 7 所提路径规划方法的训练流程图

图 8 仿真实验模拟环境图

表 2 仿真实验中的主要超参数

表 3 不同路径规划方法在训练环境中的性能比较结果

图 9 不同路径规划方法在训练环境中的平均奖励值

表 4 不同路径规划方法在测试环境中的性能比较结果

图 10 不同路径规划方法在测试环境中的运动轨迹图

图 11 差速机器人

图 12 模拟实验场景及先验地图

图 13 不同路径规划方法在模拟场景中的轨迹图

表 5 不同路径规划方法在模拟场景中的性能比较结果

图 14 实际作业环境及导航轨迹图

29	HUANG Yansong, YAO Xifan, JING Xuan, et al DQN-based AGV path planning for situations with multi-starts and multi-targets[J]. Computer Integrated Manufacturing Systems, 2023, 29 (8): 2550- 2562
30	XING B, WANG X, LIU Z The wide-area coverage path planning strategy for deep-sea mining vehicle cluster based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2024, 12 (2): 316 doi: 10.3390/jmse12020316
31	王童, 李骜, 宋海荦, 等基于分层深度强化学习的移动机器人导航方法[J]. 控制与决策, 2022, 37 (11): 2799- 2807 WANG Tong, LI Ao, SONG Hailuo, et al Navigation method for mobile robot based on hierarchical deep reinforcement learning[J]. Control and Decision, 2022, 37 (11): 2799- 2807
32	徐杨, 熊举举, 李论, 等采用改进的YOLOv5s检测花椒簇[J]. 农业工程学报, 2023, 39 (16): 283- 290 XU Yang, XIONG Juju, LI Lun, et al Detecting pepper cluster using improved YOLOv5s[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023, 39 (16): 283- 290 doi: 10.11975/j.issn.1002-6819.202306119
33	刘慧, 卢云志, 张雷基于Dropout改进的SRGAN网络DrSRGAN[J]. 科学技术与工程, 2023, 23 (23): 10015- 10022 LIU Hui, LU Yunzhi, ZHANG Lei Improved SRGAN network based on Dropout called DrSRGAN[J]. Science Technology and Engineering, 2023, 23 (23): 10015- 10022 doi: 10.12404/j.issn.1671-1815.2023.23.23.10015
1	刘宇庭, 郭世杰, 唐术锋, 等改进A与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报: 工学版, 2024, 58 (2): 360- 369 LIU Yuting, GUO Shijie, TANG Shufeng, et al Path planning based on fusion of improved A and ROA-DWA for robot[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (2): 360- 369
2	章一鸣, 姚文广, 陈海进动态环境下自主机器人的双机制切向避障[J]. 浙江大学学报: 工学版, 2024, 58 (4): 779- 789 ZHANG Yiming, YAO Wenguang, CHEN Haijin Dual-mechanism tangential obstacle avoidance of autonomous robots in dynamic environment[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (4): 779- 789
3	侯文慧, 周传起, 程炎, 等基于轻量化U-Net网络的果园垄间路径识别方法[J]. 农业机械学报, 2024, 55 (2): 16- 27 HOU Wenhui, ZHOU Chuanqi, CHENG Yan, et al Path recognition method of orchard ridges based on lightweight U-Net[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (2): 16- 27 doi: 10.6041/j.issn.1000-1298.2024.02.002
4	张万枝, 赵威, 李玉华, 等基于改进A算法+LM-BZS算法的农业机器人路径规划[J]. 农业机械学报, 2024, 55 (8): 81- 92 ZHANG Wanzhi, ZHAO Wei, LI Yuhua, et al Path planning of agricultural robot based on improved A and LM-BZS algorithms[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (8): 81- 92 doi: 10.6041/j.issn.1000-1298.2024.08.007
5	张万枝, 白文静, 吕钊钦, 等线性时变模型预测控制器提高农业车辆导航路径自动跟踪精度[J]. 农业工程学报, 2017, 33 (13): 104- 111 ZHANG Wanzhi, BAI Wenjing, LÜ Zhaoqin, et al Linear time-varying model predictive controller improving precision of navigation path automatic tracking for agricultural vehicle[J]. Transactions of the Chinese Society of Agricultural Engineering, 2017, 33 (13): 104- 111 doi: 10.11975/j.issn.1002-6819.2017.13.014
6	刘正铎, 张万枝, 吕钊钦, 等基于非线性模型的农用车路径跟踪控制器设计与试验[J]. 农业机械学报, 2018, 49 (7): 23- 30 LIU Zhengduo, ZHANG Wanzhi, LÜ Zhaoqin, et al Design and test of path tracking controller based on nonlinear model prediction[J]. Transactions of the Chinese Society for Agricultural Machinery, 2018, 49 (7): 23- 30 doi: 10.6041/j.issn.1000-1298.2018.07.003
7	刘天湖, 张迪, 郑琰, 等基于改进RRT算法的菠萝采收机导航路径规划[J]. 农业工程学报, 2022, 38 (23): 20- 28 LIU Tianhu, ZHANG Di, ZHENG Yan, et al Navigation path planning of the pineapple harvester based on improved RRT algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (23): 20- 28 doi: 10.11975/j.issn.1002-6819.2022.23.003
8	劳彩莲, 李鹏, 冯宇基于改进A与DWA算法融合的温室机器人路径规划[J]. 农业机械学报, 2021, 52 (1): 14- 22 LAO Cailian, LI Peng, FENG Yu Path planning of greenhouse robot based on fusion of improved A algorithm and dynamic window approach[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52 (1): 14- 22 doi: 10.6041/j.issn.1000-1298.2021.01.002
9	景云鹏, 金志坤, 刘刚基于改进蚁群算法的农田平地导航三维路径规划方法[J]. 农业机械学报, 2020, 51 (Suppl.1): 333- 339 JING Yunpeng, JIN Zhikun, LIU Gang Three dimensional path planning method for navigation of farmland leveling based on improved ant colony algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (Suppl.1): 333- 339
10	高兴旺, 任力生, 王芳番茄温室内移动喷药机器人的路径规划研究[J]. 计算机工程与应用, 2024, 60 (16): 325- 332 GAO Xingwang, REN Lisheng, WANG Fang Path planning study of mobile spraying robot in tomato greenhouse[J]. Computer Engineering and Applications, 2024, 60 (16): 325- 332 doi: 10.3778/j.issn.1002-8331.2306-0002
11	崔永杰, 王寅初, 何智, 等基于改进RRT算法的猕猴桃采摘机器人全局路径规划[J]. 农业机械学报, 2022, 53 (6): 151- 158 CUI Yongjie, WANG Yinchu, HE Zhi, et al Global path planning of kiwifruit harvesting robot based on improved RRT algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (6): 151- 158 doi: 10.6041/j.issn.1000-1298.2022.06.015
12	陈凯, 解印山, 李彦明, 等多约束情形下的农机全覆盖路径规划方法[J]. 农业机械学报, 2022, 53 (5): 17- 26 CHEN Kai, XIE Yinshan, LI Yanming, et al Full coverage path planning method of agricultural machinery under multiple constraints[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (5): 17- 26 doi: 10.6041/j.issn.1000-1298.2022.05.002
13	谢秋菊, 王圣超, MUSABIMANA J, 等基于深度强化学习的猪舍环境控制策略优化与能耗分析[J]. 农业机械学报, 2023, 54 (11): 376- 384 XIE Qiuju, WANG Shengchao, MUSABIMANA J, et al Pig building environment optimization control and energy consumption analysis based on deep reinforcement learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (11): 376- 384 doi: 10.6041/j.issn.1000-1298.2023.11.036
14	熊俊涛, 李中行, 陈淑绵, 等基于深度强化学习的虚拟机器人采摘路径避障规划[J]. 农业机械学报, 2020, 51 (Suppl.2): 1- 10 XIONG Juntao, LI Zhonghang, CHEN Shumian, et al Obstacle avoidance planning of virtual robot picking path based on deep reinforcement learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (Suppl.2): 1- 10 doi: 10.6041/j.issn.1000-1298.2020.S2.001
15	IYENGAR K, SPURGEON S, STOYANOV D Deep reinforcement learning for concentric tube robot path following[J]. IEEE Transactions on Medical Robotics and Bionics, 2024, 6 (1): 18- 29 doi: 10.1109/TMRB.2023.3310037
16	赵淼, 谢良, 林文静, 等基于动态选择预测器的深度强化学习投资组合模型[J]. 计算机科学, 2024, 51 (4): 344- 352 ZHAO Miao, XIE Liang, LIN Wenjing, et al Deep reinforcement learning portfolio model based on dynamic selectors[J]. Computer Science, 2024, 51 (4): 344- 352 doi: 10.11896/jsjkx.230100048
17	GAO A, LU S, XU R, et al Deep reinforcement learning based planning method in state space for lunar rovers[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107287 doi: 10.1016/j.engappai.2023.107287
18	刘飞, 唐方慧, 刘琳婷, 等基于Dueling DQN算法的列车运行图节能优化研究[J]. 都市快轨交通, 2024, 37 (2): 39- 46 LIU Fei, TANG Fanghui, LIU Linting, et al Energy saving optimization of train operation timetable based on a Dueling DQN algorithm[J]. Urban Rapid Rail Transit, 2024, 37 (2): 39- 46 doi: 10.3969/j.issn.1672-6073.2024.02.006
19	李航, 廖映华, 黄波基于改进DQN算法的茶叶采摘机械手路径规划[J]. 中国农机化学报, 2023, 44 (8): 198- 205 LI Hang, LIAO Yinghua, HUANG Bo Research on path planning of tea picking manipulator based on improved DQN[J]. Journal of Chinese Agricultural Mechanization, 2023, 44 (8): 198- 205
20	林俊强, 王红军, 邹湘军, 等基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35 (8): 1692- 1704 LIN Junqiang, WANG Hongjun, ZOU Xiangjun, et al Obstacle avoidance path planning and simulation of mobile picking robot based on DPPO[J]. Journal of System Simulation, 2023, 35 (8): 1692- 1704
21	熊春源, 熊俊涛, 杨振刚, 等基于深度强化学习的柑橘采摘机械臂路径规划方法[J]. 华南农业大学学报, 2023, 44 (3): 473- 483 XIONG Chunyuan, XIONG Juntao, YANG Zhengang, et al Path planning method for citrus picking manipulator based on deep reinforcement learning[J]. Journal of South China Agricultural University, 2023, 44 (3): 473- 483 doi: 10.7671/j.issn.1001-411X.202206024
22	WANG Y, LU C, WU P, et al Path planning for unmanned surface vehicle based on improved Q-Learning algorithm[J]. Ocean Engineering, 2024, 292: 116510 doi: 10.1016/j.oceaneng.2023.116510
23	ZHOU Q, LIAN Y, WU J, et al An optimized Q-Learning algorithm for mobile robot local path planning[J]. Knowledge-Based Systems, 2024, 286: 111400 doi: 10.1016/j.knosys.2024.111400
24	史殿习, 彭滢璇, 杨焕焕, 等基于DQN的多智能体深度强化学习运动规划方法[J]. 计算机科学, 2024, 51 (2): 268- 277 SHI Dianxi, PENG Yingxuan, YANG Huanhuan, et al DQN-based multi-agent motion planning method with deep reinforcement learning[J]. Computer Science, 2024, 51 (2): 268- 277 doi: 10.11896/jsjkx.230500113
25	MIRANDA V R F, NETO A A, FREITAS G M, et al Generalization in deep reinforcement learning for robotic navigation by reward shaping[J]. IEEE Transactions on Industrial Electronics, 2024, 71 (6): 6013- 6020 doi: 10.1109/TIE.2023.3290244
26	王鑫, 仲伟志, 王俊智, 等基于深度强化学习的无人机路径规划与无线电测绘[J]. 应用科学学报, 2024, 42 (2): 200- 210 WANG Xin, ZHONG Weizhi, WANG Junzhi, et al UAV path planning and radio mapping based on deep reinforcement learning[J]. Journal of Applied Sciences, 2024, 42 (2): 200- 210 doi: 10.3969/j.issn.0255-8297.2024.02.002
27	SAGA R, KOZONO R, TSURUMI Y, et al Deep-reinforcement learning-based route planning with obstacle avoidance for autonomous vessels[J]. Artificial Life and Robotics, 2024, 29 (1): 136- 144 doi: 10.1007/s10015-023-00909-4
28	胡洁, 张亚莉, 王团, 等基于深度强化学习的农田节点数据无人机采集方法[J]. 农业工程学报, 2022, 38 (22): 41- 51 HU Jie, ZHANG Yali, WANG Tuan, et al UAV collection methods for the farmland nodes data based on deep reinforcement learning[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (22): 41- 51 doi: 10.11975/j.issn.1002-6819.2022.22.005

[1]	叶俊,肖志斌,林晓阳,全冠,王震,王跃达,何江飞,赵阳. 基于多轴3D打印的三维自支撑桁架结构优化方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1333-1343.
[2]	郝琨,孟璇,赵晓芳,李志圣. 融合自适应势场法和深度强化学习的三维水下AUV路径规划方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1451-1461.
[3]	廖榆信,王伟,滕卫明,贺海晏,王战,王进. 基于多目标约束的无人机光顺路径生成全局优化方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1481-1491.
[4]	于少猛,闫铭,王鹏飞,朱建锡,杨欣. 丘陵山地果园植保无人机三维路径规划[J]. 浙江大学学报(工学版), 2025, 59(3): 635-642.
[5]	张名芳,马健,赵娜乐,王力,刘颖. 无信号交叉口处基于深度强化学习的智能网联车辆运动规划[J]. 浙江大学学报(工学版), 2024, 58(9): 1923-1934.
[6]	叶宝林,孙瑞涛,吴维敏,陈滨,姚青. 基于异步优势演员-评论家的交通信号控制方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1671-1680.
[7]	刘宇庭,郭世杰,唐术锋,张学炜,李田田. *改进A与ROA-DWA融合的机器人路径规划**[J]. 浙江大学学报(工学版), 2024, 58(2): 360-369.
[8]	陈丽芳,杨火根,陈智超,杨杰. B样条技术与遗传算法融合的全局路径规划[J]. 浙江大学学报(工学版), 2024, 58(12): 2520-2530.
[9]	刘慧,王秀丽,沈跃,徐婕. 基于三维激光点云的苗圃场景多目标分类方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2430-2438.
[10]	张萌,王殿海,金盛. 结合领域经验的深度强化学习信号控制方法[J]. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
[11]	姜玉峰,陈东生. 基于深度强化学习的大口径轴孔装配策略[J]. 浙江大学学报(工学版), 2023, 57(11): 2210-2216.
[12]	靳佳澳,沈洪垚,孙扬帆,林嘉浩,陈静霓. 面向电弧增材的单线激光扫描路径规划[J]. 浙江大学学报(工学版), 2023, 57(1): 21-31.
[13]	徐维祥,康楠,徐婷. 基于出行计划数据的最优路径规划方法[J]. 浙江大学学报(工学版), 2022, 56(8): 1542-1552.
[14]	华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[15]	刘智敏,叶宝林,朱耀东,姚青,吴维敏. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.

Viewed

Full text

Abstract

Cited

Shared

Discussed