Path planning of agricultural robots based on improved deep reinforcement learning algorithm

doi:10.3785/j.issn.1008-973X.2025.07.017

Journal of ZheJiang University (Engineering Science)

2025, Vol. 59

Issue (7): 1492-1503 DOI: 10.3785/j.issn.1008-973X.2025.07.017

Path planning of agricultural robots based on improved deep reinforcement learning algorithm

Wei ZHAO1,2(

),Wanzhi ZHANG1,2,*(

),Jialin HOU1,2,Rui HOU3,Yuhua LI1,4,Lejun ZHAO1,4,Jin Cheng1,2

1. College of Mechanical and Electronic Engineering, Shandong Agricultural University, Taian 271018, China
2. Shandong Engineering Research Center of Agricultural Equipment Intelligentization, Taian 271018, China
3. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
4. Shandong Key Laboratory of Intelligent Production Technology and Equipment for Facility Horticulture, Taian 271018, China

Download:

HTML

PDF(2200KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

In order to solve the problems of difficulty in finding target points, sparse rewards, and slow convergence when using deep reinforcement learning algorithms for path planning of agricultural robots, a path-planning method based on multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN) was proposed. The laser simultaneous localization and mapping (SLAM) was used to scan the global environment to construct a prior map and divide the walking row and crop row areas, and the map boundary was expanded and fitted to form a forward bow-shaped operation corridor. The middle target point was used to segment the global environment, and the complex environment was divided into a multi-stage short-range navigation environment to simplify the target point search process. The deep Q-network algorithm was improved from three aspects: action space, exploration strategy and reward function to improve the reward sparsity problem, accelerate the convergence speed of the algorithm, and improve the navigation success rate. Experimental results showed that the total number of collisions of agricultural robots equipped with the MPN-DQN algorithm was 1, the average navigation time was 104.27 s, the average navigation distance was 16.58 m, and the average navigation success rate was 95%.

Key words： deep reinforcement learning agricultural robot intermediate target point multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN) path planning

Received: 04 September 2024 Published: 25 July 2025

CLC:

TP 242

Fund: 山东省重点研发计划（重大科技创新工程）项目（2022CXGC020703）；山东省薯类产业技术体系农业机械岗位专家项目（SDAIT-16-10）；山东省重点研发计划（乡村振兴科技创新提振行动计划）项目（2022TZXD006）.

Corresponding Authors: Wanzhi ZHANG E-mail: zhao868250709@163.com;zhangwanzhi@163.com

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Wei ZHAO
	Wanzhi ZHANG
	Jialin HOU
	Rui HOU
	Yuhua LI
	Lejun ZHAO
	Jin Cheng

Cite this article:

Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng. Path planning of agricultural robots based on improved deep reinforcement learning algorithm. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1492-1503.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.07.017 OR https://www.zjujournals.com/eng/Y2025/V59/I7/1492

基于改进深度强化学习算法的农业机器人路径规划

农业机器人采用深度强化学习算法进行路径规划时存在难以找到目标点、稀疏奖励、收敛缓慢等问题，为此提出基于多目标点导航融合改进深度Q网络算法（MPN-DQN）的路径规划方法. 利用激光同步定位与建图（SLAM）扫描全局环境以构建先验地图，划分行走行和作物行区域；对地图边界进行膨胀拟合处理，形成前向弓字形作业走廊. 利用中间目标点分割全局环境，将复杂环境划分为多阶段短程导航环境以简化目标点搜索过程. 从动作空间、探索策略和奖励函数3个方面改进深度Q网络算法以改善奖励稀疏问题，加快算法收敛速度，提高导航成功率. 实验结果表明，搭载MPN-DQN的农业机器人自主行驶的总碰撞次数为1，平均导航时间为104.27 s，平均导航路程为16.58 m，平均导航成功率为95%.

关键词： 深度强化学习, 农业机器人, 中间目标点, 多目标点导航融合改进深度Q网络算法(MPN-DQN), 路径规划

Fig.1 Overall flow chart of proposed path planning method

Fig.2 Middle target point splits global environment schematic

Fig.3 Action space rule diagram (three neighbourhoods)

Tab.1 Angular velocity corresponds to direction of action

Fig.4 Area division diagram of target point of scenario

Fig.5 Kinematic model of two-wheeled differential robot

Fig.6 Network structure of proposed path planning method

Fig.7 Training flowchart for proposed path planning method

Fig.8 Simulation experiment simulation environment diagram

Tab.2 Main hyperparameters in simulation experiment

Tab.3 Performance comparison results of different path planning methods in training environment

Fig.9 Average rewards of different path planning methods in training environment

Tab.4 Performance comparison results of different path planning methods in testing environment

Fig.10 Trajectory diagram of different path planning methods in testing environment

Fig.11 Differential robot

Fig.12 Simulated experimental scenarios and prior maps

Fig.13 Trajectory diagram of different path planning methods in simulated scenarios

Tab.5 Performance comparison results of different path planning methods in simulated scenarios

Fig.14 Actual working environment and navigation trajectory


[29]	HUANG Yansong, YAO Xifan, JING Xuan, et al DQN-based AGV path planning for situations with multi-starts and multi-targets[J]. Computer Integrated Manufacturing Systems, 2023, 29 (8): 2550- 2562

[30]	XING B, WANG X, LIU Z The wide-area coverage path planning strategy for deep-sea mining vehicle cluster based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2024, 12 (2): 316 doi: 10.3390/jmse12020316

[31]	王童, 李骜, 宋海荦, 等基于分层深度强化学习的移动机器人导航方法[J]. 控制与决策, 2022, 37 (11): 2799- 2807 WANG Tong, LI Ao, SONG Hailuo, et al Navigation method for mobile robot based on hierarchical deep reinforcement learning[J]. Control and Decision, 2022, 37 (11): 2799- 2807

[32]	徐杨, 熊举举, 李论, 等采用改进的YOLOv5s检测花椒簇[J]. 农业工程学报, 2023, 39 (16): 283- 290 XU Yang, XIONG Juju, LI Lun, et al Detecting pepper cluster using improved YOLOv5s[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023, 39 (16): 283- 290 doi: 10.11975/j.issn.1002-6819.202306119

[33]	刘慧, 卢云志, 张雷基于Dropout改进的SRGAN网络DrSRGAN[J]. 科学技术与工程, 2023, 23 (23): 10015- 10022 LIU Hui, LU Yunzhi, ZHANG Lei Improved SRGAN network based on Dropout called DrSRGAN[J]. Science Technology and Engineering, 2023, 23 (23): 10015- 10022 doi: 10.12404/j.issn.1671-1815.2023.23.23.10015

[1]	刘宇庭, 郭世杰, 唐术锋, 等改进A与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报: 工学版, 2024, 58 (2): 360- 369 LIU Yuting, GUO Shijie, TANG Shufeng, et al Path planning based on fusion of improved A and ROA-DWA for robot[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (2): 360- 369

[2]	章一鸣, 姚文广, 陈海进动态环境下自主机器人的双机制切向避障[J]. 浙江大学学报: 工学版, 2024, 58 (4): 779- 789 ZHANG Yiming, YAO Wenguang, CHEN Haijin Dual-mechanism tangential obstacle avoidance of autonomous robots in dynamic environment[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (4): 779- 789

[3]	侯文慧, 周传起, 程炎, 等基于轻量化U-Net网络的果园垄间路径识别方法[J]. 农业机械学报, 2024, 55 (2): 16- 27 HOU Wenhui, ZHOU Chuanqi, CHENG Yan, et al Path recognition method of orchard ridges based on lightweight U-Net[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (2): 16- 27 doi: 10.6041/j.issn.1000-1298.2024.02.002

[4]	张万枝, 赵威, 李玉华, 等基于改进A算法+LM-BZS算法的农业机器人路径规划[J]. 农业机械学报, 2024, 55 (8): 81- 92 ZHANG Wanzhi, ZHAO Wei, LI Yuhua, et al Path planning of agricultural robot based on improved A and LM-BZS algorithms[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (8): 81- 92 doi: 10.6041/j.issn.1000-1298.2024.08.007

[5]	张万枝, 白文静, 吕钊钦, 等线性时变模型预测控制器提高农业车辆导航路径自动跟踪精度[J]. 农业工程学报, 2017, 33 (13): 104- 111 ZHANG Wanzhi, BAI Wenjing, LÜ Zhaoqin, et al Linear time-varying model predictive controller improving precision of navigation path automatic tracking for agricultural vehicle[J]. Transactions of the Chinese Society of Agricultural Engineering, 2017, 33 (13): 104- 111 doi: 10.11975/j.issn.1002-6819.2017.13.014

[6]	刘正铎, 张万枝, 吕钊钦, 等基于非线性模型的农用车路径跟踪控制器设计与试验[J]. 农业机械学报, 2018, 49 (7): 23- 30 LIU Zhengduo, ZHANG Wanzhi, LÜ Zhaoqin, et al Design and test of path tracking controller based on nonlinear model prediction[J]. Transactions of the Chinese Society for Agricultural Machinery, 2018, 49 (7): 23- 30 doi: 10.6041/j.issn.1000-1298.2018.07.003

[7]	刘天湖, 张迪, 郑琰, 等基于改进RRT算法的菠萝采收机导航路径规划[J]. 农业工程学报, 2022, 38 (23): 20- 28 LIU Tianhu, ZHANG Di, ZHENG Yan, et al Navigation path planning of the pineapple harvester based on improved RRT algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (23): 20- 28 doi: 10.11975/j.issn.1002-6819.2022.23.003

[8]	劳彩莲, 李鹏, 冯宇基于改进A与DWA算法融合的温室机器人路径规划[J]. 农业机械学报, 2021, 52 (1): 14- 22 LAO Cailian, LI Peng, FENG Yu Path planning of greenhouse robot based on fusion of improved A algorithm and dynamic window approach[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52 (1): 14- 22 doi: 10.6041/j.issn.1000-1298.2021.01.002

[9]	景云鹏, 金志坤, 刘刚基于改进蚁群算法的农田平地导航三维路径规划方法[J]. 农业机械学报, 2020, 51 (Suppl.1): 333- 339 JING Yunpeng, JIN Zhikun, LIU Gang Three dimensional path planning method for navigation of farmland leveling based on improved ant colony algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (Suppl.1): 333- 339

[10]	高兴旺, 任力生, 王芳番茄温室内移动喷药机器人的路径规划研究[J]. 计算机工程与应用, 2024, 60 (16): 325- 332 GAO Xingwang, REN Lisheng, WANG Fang Path planning study of mobile spraying robot in tomato greenhouse[J]. Computer Engineering and Applications, 2024, 60 (16): 325- 332 doi: 10.3778/j.issn.1002-8331.2306-0002

[11]	崔永杰, 王寅初, 何智, 等基于改进RRT算法的猕猴桃采摘机器人全局路径规划[J]. 农业机械学报, 2022, 53 (6): 151- 158 CUI Yongjie, WANG Yinchu, HE Zhi, et al Global path planning of kiwifruit harvesting robot based on improved RRT algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (6): 151- 158 doi: 10.6041/j.issn.1000-1298.2022.06.015

[12]	陈凯, 解印山, 李彦明, 等多约束情形下的农机全覆盖路径规划方法[J]. 农业机械学报, 2022, 53 (5): 17- 26 CHEN Kai, XIE Yinshan, LI Yanming, et al Full coverage path planning method of agricultural machinery under multiple constraints[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (5): 17- 26 doi: 10.6041/j.issn.1000-1298.2022.05.002

[13]	谢秋菊, 王圣超, MUSABIMANA J, 等基于深度强化学习的猪舍环境控制策略优化与能耗分析[J]. 农业机械学报, 2023, 54 (11): 376- 384 XIE Qiuju, WANG Shengchao, MUSABIMANA J, et al Pig building environment optimization control and energy consumption analysis based on deep reinforcement learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (11): 376- 384 doi: 10.6041/j.issn.1000-1298.2023.11.036

[14]	熊俊涛, 李中行, 陈淑绵, 等基于深度强化学习的虚拟机器人采摘路径避障规划[J]. 农业机械学报, 2020, 51 (Suppl.2): 1- 10 XIONG Juntao, LI Zhonghang, CHEN Shumian, et al Obstacle avoidance planning of virtual robot picking path based on deep reinforcement learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (Suppl.2): 1- 10 doi: 10.6041/j.issn.1000-1298.2020.S2.001

[15]	IYENGAR K, SPURGEON S, STOYANOV D Deep reinforcement learning for concentric tube robot path following[J]. IEEE Transactions on Medical Robotics and Bionics, 2024, 6 (1): 18- 29 doi: 10.1109/TMRB.2023.3310037

[16]	赵淼, 谢良, 林文静, 等基于动态选择预测器的深度强化学习投资组合模型[J]. 计算机科学, 2024, 51 (4): 344- 352 ZHAO Miao, XIE Liang, LIN Wenjing, et al Deep reinforcement learning portfolio model based on dynamic selectors[J]. Computer Science, 2024, 51 (4): 344- 352 doi: 10.11896/jsjkx.230100048

[17]	GAO A, LU S, XU R, et al Deep reinforcement learning based planning method in state space for lunar rovers[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107287 doi: 10.1016/j.engappai.2023.107287

[18]	刘飞, 唐方慧, 刘琳婷, 等基于Dueling DQN算法的列车运行图节能优化研究[J]. 都市快轨交通, 2024, 37 (2): 39- 46 LIU Fei, TANG Fanghui, LIU Linting, et al Energy saving optimization of train operation timetable based on a Dueling DQN algorithm[J]. Urban Rapid Rail Transit, 2024, 37 (2): 39- 46 doi: 10.3969/j.issn.1672-6073.2024.02.006

[19]	李航, 廖映华, 黄波基于改进DQN算法的茶叶采摘机械手路径规划[J]. 中国农机化学报, 2023, 44 (8): 198- 205 LI Hang, LIAO Yinghua, HUANG Bo Research on path planning of tea picking manipulator based on improved DQN[J]. Journal of Chinese Agricultural Mechanization, 2023, 44 (8): 198- 205

[20]	林俊强, 王红军, 邹湘军, 等基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35 (8): 1692- 1704 LIN Junqiang, WANG Hongjun, ZOU Xiangjun, et al Obstacle avoidance path planning and simulation of mobile picking robot based on DPPO[J]. Journal of System Simulation, 2023, 35 (8): 1692- 1704

[21]	熊春源, 熊俊涛, 杨振刚, 等基于深度强化学习的柑橘采摘机械臂路径规划方法[J]. 华南农业大学学报, 2023, 44 (3): 473- 483 XIONG Chunyuan, XIONG Juntao, YANG Zhengang, et al Path planning method for citrus picking manipulator based on deep reinforcement learning[J]. Journal of South China Agricultural University, 2023, 44 (3): 473- 483 doi: 10.7671/j.issn.1001-411X.202206024

[22]	WANG Y, LU C, WU P, et al Path planning for unmanned surface vehicle based on improved Q-Learning algorithm[J]. Ocean Engineering, 2024, 292: 116510 doi: 10.1016/j.oceaneng.2023.116510

[23]	ZHOU Q, LIAN Y, WU J, et al An optimized Q-Learning algorithm for mobile robot local path planning[J]. Knowledge-Based Systems, 2024, 286: 111400 doi: 10.1016/j.knosys.2024.111400

[24]	史殿习, 彭滢璇, 杨焕焕, 等基于DQN的多智能体深度强化学习运动规划方法[J]. 计算机科学, 2024, 51 (2): 268- 277 SHI Dianxi, PENG Yingxuan, YANG Huanhuan, et al DQN-based multi-agent motion planning method with deep reinforcement learning[J]. Computer Science, 2024, 51 (2): 268- 277 doi: 10.11896/jsjkx.230500113

[25]	MIRANDA V R F, NETO A A, FREITAS G M, et al Generalization in deep reinforcement learning for robotic navigation by reward shaping[J]. IEEE Transactions on Industrial Electronics, 2024, 71 (6): 6013- 6020 doi: 10.1109/TIE.2023.3290244

[26]	王鑫, 仲伟志, 王俊智, 等基于深度强化学习的无人机路径规划与无线电测绘[J]. 应用科学学报, 2024, 42 (2): 200- 210 WANG Xin, ZHONG Weizhi, WANG Junzhi, et al UAV path planning and radio mapping based on deep reinforcement learning[J]. Journal of Applied Sciences, 2024, 42 (2): 200- 210 doi: 10.3969/j.issn.0255-8297.2024.02.002

[27]	SAGA R, KOZONO R, TSURUMI Y, et al Deep-reinforcement learning-based route planning with obstacle avoidance for autonomous vessels[J]. Artificial Life and Robotics, 2024, 29 (1): 136- 144 doi: 10.1007/s10015-023-00909-4

[28]	胡洁, 张亚莉, 王团, 等基于深度强化学习的农田节点数据无人机采集方法[J]. 农业工程学报, 2022, 38 (22): 41- 51 HU Jie, ZHANG Yali, WANG Tuan, et al UAV collection methods for the farmland nodes data based on deep reinforcement learning[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (22): 41- 51 doi: 10.11975/j.issn.1002-6819.2022.22.005

[1]	Jun YE,Zhibin XIAO,Xiaoyang LIN,Guan QUAN,Zhen WANG,Yueda WANG,Jiangfei HE,Yang ZHAO. Optimization methods of 3D self-supporting truss structure based on muti-axis 3D printing[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1333-1343.

[2]	Kun HAO,Xuan MENG,Xiaofang ZHAO,Zhisheng LI. 3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1451-1461.

[3]	Yuxin LIAO,Wei WANG,Weiming TENG,Haiyan HE,Zhan WANG,Jin WANG. Multi-objective constraint-based smooth path generation for UAVs global optimization method[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1481-1491.

[4]	Shaomeng YU,Ming YAN,Pengfei WANG,Jianxi ZHU,Xin YANG. 3D path planning of plant protection UAVs in hilly mountainous orchards[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 635-642.

[5]	Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU. Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1923-1934.

[6]	Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.

[7]	Yuting LIU,Shijie GUO,Shufeng TANG,Xuewei ZHANG,Tiantian LI. *Path planning based on fusion of improved A and ROA-DWA for robot**[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 360-369.

[8]	Lifang CHEN,Huogen YANG,Zhichao CHEN,Jie YANG. Global path planning with integration of B-spline technique and genetic algorithm[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2520-2530.

[9]	Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.

[10]	Hui LIU,Xiu-li WANG,Yue SHEN,Jie XU. Multi-objective classification method of nursery scene based on 3D laser point cloud[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2430-2438.

[11]	Yu-feng JIANG,Dong-sheng CHEN. Assembly strategy for large-diameter peg-in-hole based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2210-2216.

[12]	Jia-ao JIN,Hong-yao SHEN,Yang-fan SUN,Jia-hao LIN,Jing-ni CHEN. Single-line laser scanning path planning for wire arc and additive manufacturing[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 21-31.

[13]	Wei-xiang XU,Nan KANG,Ting XU. Optimal path planning method based on travel plan data[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1542-1552.

[14]	Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.

[15]	Zhi-min LIU,Bao-Lin YE,Yao-dong ZHU,Qing YAO,Wei-min WU. Traffic signal control method based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1249-1256.

Viewed

Full text

Abstract

Cited

Shared

Discussed