Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (7): 1492-1503    DOI: 10.3785/j.issn.1008-973X.2025.07.017
    
Path planning of agricultural robots based on improved deep reinforcement learning algorithm
Wei ZHAO1,2(),Wanzhi ZHANG1,2,*(),Jialin HOU1,2,Rui HOU3,Yuhua LI1,4,Lejun ZHAO1,4,Jin Cheng1,2
1. College of Mechanical and Electronic Engineering, Shandong Agricultural University, Taian 271018, China
2. Shandong Engineering Research Center of Agricultural Equipment Intelligentization, Taian 271018, China
3. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
4. Shandong Key Laboratory of Intelligent Production Technology and Equipment for Facility Horticulture, Taian 271018, China
Download: HTML     PDF(2200KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

In order to solve the problems of difficulty in finding target points, sparse rewards, and slow convergence when using deep reinforcement learning algorithms for path planning of agricultural robots, a path-planning method based on multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN) was proposed. The laser simultaneous localization and mapping (SLAM) was used to scan the global environment to construct a prior map and divide the walking row and crop row areas, and the map boundary was expanded and fitted to form a forward bow-shaped operation corridor. The middle target point was used to segment the global environment, and the complex environment was divided into a multi-stage short-range navigation environment to simplify the target point search process. The deep Q-network algorithm was improved from three aspects: action space, exploration strategy and reward function to improve the reward sparsity problem, accelerate the convergence speed of the algorithm, and improve the navigation success rate. Experimental results showed that the total number of collisions of agricultural robots equipped with the MPN-DQN algorithm was 1, the average navigation time was 104.27 s, the average navigation distance was 16.58 m, and the average navigation success rate was 95%.



Key wordsdeep reinforcement learning      agricultural robot      intermediate target point      multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN)      path planning     
Received: 04 September 2024      Published: 25 July 2025
CLC:  TP 242  
Fund:  山东省重点研发计划(重大科技创新工程)项目(2022CXGC020703);山东省薯类产业技术体系农业机械岗位专家项目(SDAIT-16-10);山东省重点研发计划(乡村振兴科技创新提振行动计划)项目(2022TZXD006).
Corresponding Authors: Wanzhi ZHANG     E-mail: zhao868250709@163.com;zhangwanzhi@163.com
Cite this article:

Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng. Path planning of agricultural robots based on improved deep reinforcement learning algorithm. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1492-1503.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.07.017     OR     https://www.zjujournals.com/eng/Y2025/V59/I7/1492


基于改进深度强化学习算法的农业机器人路径规划

农业机器人采用深度强化学习算法进行路径规划时存在难以找到目标点、稀疏奖励、收敛缓慢等问题,为此提出基于多目标点导航融合改进深度Q网络算法(MPN-DQN)的路径规划方法. 利用激光同步定位与建图(SLAM)扫描全局环境以构建先验地图,划分行走行和作物行区域;对地图边界进行膨胀拟合处理,形成前向弓字形作业走廊. 利用中间目标点分割全局环境,将复杂环境划分为多阶段短程导航环境以简化目标点搜索过程. 从动作空间、探索策略和奖励函数3个方面改进深度Q网络算法以改善奖励稀疏问题,加快算法收敛速度,提高导航成功率. 实验结果表明,搭载MPN-DQN的农业机器人自主行驶的总碰撞次数为1,平均导航时间为104.27 s,平均导航路程为16.58 m,平均导航成功率为95%.


关键词: 深度强化学习,  农业机器人,  中间目标点,  多目标点导航融合改进深度Q网络算法(MPN-DQN),  路径规划 
Fig.1 Overall flow chart of proposed path planning method
Fig.2 Middle target point splits global environment schematic
Fig.3 Action space rule diagram (three neighbourhoods)
ai速度方向ω/(rad·s?1
0车头正前方左偏45°0.785
1车头正前方0
2车头正前方右偏45°?0.785
Tab.1 Angular velocity corresponds to direction of action
Fig.4 Area division diagram of target point of scenario
Fig.5 Kinematic model of two-wheeled differential robot
Fig.6 Network structure of proposed path planning method
Fig.7 Training flowchart for proposed path planning method
Fig.8 Simulation experiment simulation environment diagram
参数数值参数数值
中线奖励因子a1.25经验池容量Rn128 000
区域奖励因子b±1.2学习率lr0.001
贪婪权重α0.6训练回合数Ep2 000
优先经验回放权重β0.4每回合时间步数Es1 000
折扣率γ0.9网络更新频率ui10
抽样保证因子τ0.3每批次样本数bs128
衰减步数εd1 000
Tab.2 Main hyperparameters in simulation experiment
方法N∈[1, 900]N∈[901, 1600]N∈[1601, 2000]
Ctntna/spns/%Ctntna/spns/%Ctntna/spns/%
传统DQN66297.3526.4439791.5743.2913889.2665.50
DDQN61398.6431.8930889.3856.0011483.5971.50
Dueling DQN59699.3733.7830388.4556.7110382.3174.25
本研究40996.7254.569783.2586.14076.77100.00
Tab.3 Performance comparison results of different path planning methods in training environment
Fig.9 Average rewards of different path planning methods in training environment
方法Ctntna/sdna/mpns/%
传统DQN134106.5717.9973.2
DDQN103104.2517.1279.4
Dueling DQN97103.1616.9080.6
本研究1192.3515.5697.8
Tab.4 Performance comparison results of different path planning methods in testing environment
Fig.10 Trajectory diagram of different path planning methods in testing environment
Fig.11 Differential robot
Fig.12 Simulated experimental scenarios and prior maps
Fig.13 Trajectory diagram of different path planning methods in simulated scenarios
对比算法Ctntna/sdna/mpns/%
传统DQN14127.2619.2165.0
DDQN10115.1717.5575.0
Dueling DQN9112.9116.5977.5
本研究196.9315.8797.5
Tab.5 Performance comparison results of different path planning methods in simulated scenarios
Fig.14 Actual working environment and navigation trajectory
[29]   HUANG Yansong, YAO Xifan, JING Xuan, et al DQN-based AGV path planning for situations with multi-starts and multi-targets[J]. Computer Integrated Manufacturing Systems, 2023, 29 (8): 2550- 2562
[30]   XING B, WANG X, LIU Z The wide-area coverage path planning strategy for deep-sea mining vehicle cluster based on deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2024, 12 (2): 316
doi: 10.3390/jmse12020316
[31]   王童, 李骜, 宋海荦, 等 基于分层深度强化学习的移动机器人导航方法[J]. 控制与决策, 2022, 37 (11): 2799- 2807
WANG Tong, LI Ao, SONG Hailuo, et al Navigation method for mobile robot based on hierarchical deep reinforcement learning[J]. Control and Decision, 2022, 37 (11): 2799- 2807
[32]   徐杨, 熊举举, 李论, 等 采用改进的YOLOv5s检测花椒簇[J]. 农业工程学报, 2023, 39 (16): 283- 290
XU Yang, XIONG Juju, LI Lun, et al Detecting pepper cluster using improved YOLOv5s[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023, 39 (16): 283- 290
doi: 10.11975/j.issn.1002-6819.202306119
[33]   刘慧, 卢云志, 张雷 基于Dropout改进的SRGAN网络DrSRGAN[J]. 科学技术与工程, 2023, 23 (23): 10015- 10022
LIU Hui, LU Yunzhi, ZHANG Lei Improved SRGAN network based on Dropout called DrSRGAN[J]. Science Technology and Engineering, 2023, 23 (23): 10015- 10022
doi: 10.12404/j.issn.1671-1815.2023.23.23.10015
[1]   刘宇庭, 郭世杰, 唐术锋, 等 改进A*与ROA-DWA融合的机器人路径规划[J]. 浙江大学学报: 工学版, 2024, 58 (2): 360- 369
LIU Yuting, GUO Shijie, TANG Shufeng, et al Path planning based on fusion of improved A* and ROA-DWA for robot[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (2): 360- 369
[2]   章一鸣, 姚文广, 陈海进 动态环境下自主机器人的双机制切向避障[J]. 浙江大学学报: 工学版, 2024, 58 (4): 779- 789
ZHANG Yiming, YAO Wenguang, CHEN Haijin Dual-mechanism tangential obstacle avoidance of autonomous robots in dynamic environment[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (4): 779- 789
[3]   侯文慧, 周传起, 程炎, 等 基于轻量化U-Net网络的果园垄间路径识别方法[J]. 农业机械学报, 2024, 55 (2): 16- 27
HOU Wenhui, ZHOU Chuanqi, CHENG Yan, et al Path recognition method of orchard ridges based on lightweight U-Net[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (2): 16- 27
doi: 10.6041/j.issn.1000-1298.2024.02.002
[4]   张万枝, 赵威, 李玉华, 等 基于改进A*算法+LM-BZS算法的农业机器人路径规划[J]. 农业机械学报, 2024, 55 (8): 81- 92
ZHANG Wanzhi, ZHAO Wei, LI Yuhua, et al Path planning of agricultural robot based on improved A* and LM-BZS algorithms[J]. Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (8): 81- 92
doi: 10.6041/j.issn.1000-1298.2024.08.007
[5]   张万枝, 白文静, 吕钊钦, 等 线性时变模型预测控制器提高农业车辆导航路径自动跟踪精度[J]. 农业工程学报, 2017, 33 (13): 104- 111
ZHANG Wanzhi, BAI Wenjing, LÜ Zhaoqin, et al Linear time-varying model predictive controller improving precision of navigation path automatic tracking for agricultural vehicle[J]. Transactions of the Chinese Society of Agricultural Engineering, 2017, 33 (13): 104- 111
doi: 10.11975/j.issn.1002-6819.2017.13.014
[6]   刘正铎, 张万枝, 吕钊钦, 等 基于非线性模型的农用车路径跟踪控制器设计与试验[J]. 农业机械学报, 2018, 49 (7): 23- 30
LIU Zhengduo, ZHANG Wanzhi, LÜ Zhaoqin, et al Design and test of path tracking controller based on nonlinear model prediction[J]. Transactions of the Chinese Society for Agricultural Machinery, 2018, 49 (7): 23- 30
doi: 10.6041/j.issn.1000-1298.2018.07.003
[7]   刘天湖, 张迪, 郑琰, 等 基于改进RRT*算法的菠萝采收机导航路径规划[J]. 农业工程学报, 2022, 38 (23): 20- 28
LIU Tianhu, ZHANG Di, ZHENG Yan, et al Navigation path planning of the pineapple harvester based on improved RRT* algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (23): 20- 28
doi: 10.11975/j.issn.1002-6819.2022.23.003
[8]   劳彩莲, 李鹏, 冯宇 基于改进A*与DWA算法融合的温室机器人路径规划[J]. 农业机械学报, 2021, 52 (1): 14- 22
LAO Cailian, LI Peng, FENG Yu Path planning of greenhouse robot based on fusion of improved A* algorithm and dynamic window approach[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52 (1): 14- 22
doi: 10.6041/j.issn.1000-1298.2021.01.002
[9]   景云鹏, 金志坤, 刘刚 基于改进蚁群算法的农田平地导航三维路径规划方法[J]. 农业机械学报, 2020, 51 (Suppl.1): 333- 339
JING Yunpeng, JIN Zhikun, LIU Gang Three dimensional path planning method for navigation of farmland leveling based on improved ant colony algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (Suppl.1): 333- 339
[10]   高兴旺, 任力生, 王芳 番茄温室内移动喷药机器人的路径规划研究[J]. 计算机工程与应用, 2024, 60 (16): 325- 332
GAO Xingwang, REN Lisheng, WANG Fang Path planning study of mobile spraying robot in tomato greenhouse[J]. Computer Engineering and Applications, 2024, 60 (16): 325- 332
doi: 10.3778/j.issn.1002-8331.2306-0002
[11]   崔永杰, 王寅初, 何智, 等 基于改进RRT算法的猕猴桃采摘机器人全局路径规划[J]. 农业机械学报, 2022, 53 (6): 151- 158
CUI Yongjie, WANG Yinchu, HE Zhi, et al Global path planning of kiwifruit harvesting robot based on improved RRT algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (6): 151- 158
doi: 10.6041/j.issn.1000-1298.2022.06.015
[12]   陈凯, 解印山, 李彦明, 等 多约束情形下的农机全覆盖路径规划方法[J]. 农业机械学报, 2022, 53 (5): 17- 26
CHEN Kai, XIE Yinshan, LI Yanming, et al Full coverage path planning method of agricultural machinery under multiple constraints[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53 (5): 17- 26
doi: 10.6041/j.issn.1000-1298.2022.05.002
[13]   谢秋菊, 王圣超, MUSABIMANA J, 等 基于深度强化学习的猪舍环境控制策略优化与能耗分析[J]. 农业机械学报, 2023, 54 (11): 376- 384
XIE Qiuju, WANG Shengchao, MUSABIMANA J, et al Pig building environment optimization control and energy consumption analysis based on deep reinforcement learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (11): 376- 384
doi: 10.6041/j.issn.1000-1298.2023.11.036
[14]   熊俊涛, 李中行, 陈淑绵, 等 基于深度强化学习的虚拟机器人采摘路径避障规划[J]. 农业机械学报, 2020, 51 (Suppl.2): 1- 10
XIONG Juntao, LI Zhonghang, CHEN Shumian, et al Obstacle avoidance planning of virtual robot picking path based on deep reinforcement learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51 (Suppl.2): 1- 10
doi: 10.6041/j.issn.1000-1298.2020.S2.001
[15]   IYENGAR K, SPURGEON S, STOYANOV D Deep reinforcement learning for concentric tube robot path following[J]. IEEE Transactions on Medical Robotics and Bionics, 2024, 6 (1): 18- 29
doi: 10.1109/TMRB.2023.3310037
[16]   赵淼, 谢良, 林文静, 等 基于动态选择预测器的深度强化学习投资组合模型[J]. 计算机科学, 2024, 51 (4): 344- 352
ZHAO Miao, XIE Liang, LIN Wenjing, et al Deep reinforcement learning portfolio model based on dynamic selectors[J]. Computer Science, 2024, 51 (4): 344- 352
doi: 10.11896/jsjkx.230100048
[17]   GAO A, LU S, XU R, et al Deep reinforcement learning based planning method in state space for lunar rovers[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107287
doi: 10.1016/j.engappai.2023.107287
[18]   刘飞, 唐方慧, 刘琳婷, 等 基于Dueling DQN算法的列车运行图节能优化研究[J]. 都市快轨交通, 2024, 37 (2): 39- 46
LIU Fei, TANG Fanghui, LIU Linting, et al Energy saving optimization of train operation timetable based on a Dueling DQN algorithm[J]. Urban Rapid Rail Transit, 2024, 37 (2): 39- 46
doi: 10.3969/j.issn.1672-6073.2024.02.006
[19]   李航, 廖映华, 黄波 基于改进DQN算法的茶叶采摘机械手路径规划[J]. 中国农机化学报, 2023, 44 (8): 198- 205
LI Hang, LIAO Yinghua, HUANG Bo Research on path planning of tea picking manipulator based on improved DQN[J]. Journal of Chinese Agricultural Mechanization, 2023, 44 (8): 198- 205
[20]   林俊强, 王红军, 邹湘军, 等 基于DPPO的移动采摘机器人避障路径规划及仿真[J]. 系统仿真学报, 2023, 35 (8): 1692- 1704
LIN Junqiang, WANG Hongjun, ZOU Xiangjun, et al Obstacle avoidance path planning and simulation of mobile picking robot based on DPPO[J]. Journal of System Simulation, 2023, 35 (8): 1692- 1704
[21]   熊春源, 熊俊涛, 杨振刚, 等 基于深度强化学习的柑橘采摘机械臂路径规划方法[J]. 华南农业大学学报, 2023, 44 (3): 473- 483
XIONG Chunyuan, XIONG Juntao, YANG Zhengang, et al Path planning method for citrus picking manipulator based on deep reinforcement learning[J]. Journal of South China Agricultural University, 2023, 44 (3): 473- 483
doi: 10.7671/j.issn.1001-411X.202206024
[22]   WANG Y, LU C, WU P, et al Path planning for unmanned surface vehicle based on improved Q-Learning algorithm[J]. Ocean Engineering, 2024, 292: 116510
doi: 10.1016/j.oceaneng.2023.116510
[23]   ZHOU Q, LIAN Y, WU J, et al An optimized Q-Learning algorithm for mobile robot local path planning[J]. Knowledge-Based Systems, 2024, 286: 111400
doi: 10.1016/j.knosys.2024.111400
[24]   史殿习, 彭滢璇, 杨焕焕, 等 基于DQN的多智能体深度强化学习运动规划方法[J]. 计算机科学, 2024, 51 (2): 268- 277
SHI Dianxi, PENG Yingxuan, YANG Huanhuan, et al DQN-based multi-agent motion planning method with deep reinforcement learning[J]. Computer Science, 2024, 51 (2): 268- 277
doi: 10.11896/jsjkx.230500113
[25]   MIRANDA V R F, NETO A A, FREITAS G M, et al Generalization in deep reinforcement learning for robotic navigation by reward shaping[J]. IEEE Transactions on Industrial Electronics, 2024, 71 (6): 6013- 6020
doi: 10.1109/TIE.2023.3290244
[26]   王鑫, 仲伟志, 王俊智, 等 基于深度强化学习的无人机路径规划与无线电测绘[J]. 应用科学学报, 2024, 42 (2): 200- 210
WANG Xin, ZHONG Weizhi, WANG Junzhi, et al UAV path planning and radio mapping based on deep reinforcement learning[J]. Journal of Applied Sciences, 2024, 42 (2): 200- 210
doi: 10.3969/j.issn.0255-8297.2024.02.002
[27]   SAGA R, KOZONO R, TSURUMI Y, et al Deep-reinforcement learning-based route planning with obstacle avoidance for autonomous vessels[J]. Artificial Life and Robotics, 2024, 29 (1): 136- 144
doi: 10.1007/s10015-023-00909-4
[28]   胡洁, 张亚莉, 王团, 等 基于深度强化学习的农田节点数据无人机采集方法[J]. 农业工程学报, 2022, 38 (22): 41- 51
HU Jie, ZHANG Yali, WANG Tuan, et al UAV collection methods for the farmland nodes data based on deep reinforcement learning[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (22): 41- 51
doi: 10.11975/j.issn.1002-6819.2022.22.005
[1] Jun YE,Zhibin XIAO,Xiaoyang LIN,Guan QUAN,Zhen WANG,Yueda WANG,Jiangfei HE,Yang ZHAO. Optimization methods of 3D self-supporting truss structure based on muti-axis 3D printing[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1333-1343.
[2] Kun HAO,Xuan MENG,Xiaofang ZHAO,Zhisheng LI. 3D underwater AUV path planning method integrating adaptive potential field method and deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1451-1461.
[3] Yuxin LIAO,Wei WANG,Weiming TENG,Haiyan HE,Zhan WANG,Jin WANG. Multi-objective constraint-based smooth path generation for UAVs global optimization method[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1481-1491.
[4] Shaomeng YU,Ming YAN,Pengfei WANG,Jianxi ZHU,Xin YANG. 3D path planning of plant protection UAVs in hilly mountainous orchards[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 635-642.
[5] Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU. Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1923-1934.
[6] Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.
[7] Yuting LIU,Shijie GUO,Shufeng TANG,Xuewei ZHANG,Tiantian LI. Path planning based on fusion of improved A* and ROA-DWA for robot[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 360-369.
[8] Lifang CHEN,Huogen YANG,Zhichao CHEN,Jie YANG. Global path planning with integration of B-spline technique and genetic algorithm[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2520-2530.
[9] Meng ZHANG,Dian-hai WANG,Sheng JIN. Deep reinforcement learning approach to signal control combined with domain experience[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2524-2532.
[10] Hui LIU,Xiu-li WANG,Yue SHEN,Jie XU. Multi-objective classification method of nursery scene based on 3D laser point cloud[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(12): 2430-2438.
[11] Yu-feng JIANG,Dong-sheng CHEN. Assembly strategy for large-diameter peg-in-hole based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2210-2216.
[12] Jia-ao JIN,Hong-yao SHEN,Yang-fan SUN,Jia-hao LIN,Jing-ni CHEN. Single-line laser scanning path planning for wire arc and additive manufacturing[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 21-31.
[13] Wei-xiang XU,Nan KANG,Ting XU. Optimal path planning method based on travel plan data[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1542-1552.
[14] Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.
[15] Zhi-min LIU,Bao-Lin YE,Yao-dong ZHU,Qing YAO,Wei-min WU. Traffic signal control method based on deep reinforcement learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1249-1256.