State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

doi:10.1631/jzus.C1200226

Front. Inform. Technol. Electron. Eng.

2013, Vol. 14

Issue (3): 167-178 DOI: 10.1631/jzus.C1200226

State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

Xin Ma, Ya Xu, Guo-qiang Sun, Li-xia Deng, Yi-bin Li

School of Control Science and Engineering, Shandong University, Jinan 250061, China

Download:

PDF(0KB)
Export: BibTeX | EndNote (RIS)

Abstract This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments. As a computational approach to learning through interaction with the environment, reinforcement learning algorithms have been widely used for intelligent robot control, especially in the field of autonomous mobile robots. However, the learning process is slow and cumbersome. For practical applications, rapid rates of convergence are required. Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments. The state chain is built during the searching process. After one action is chosen and the reward is received, the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning. With the increasing number of Q-values updated after one action, the number of actual steps for convergence decreases and thus, the learning time decreases, where a step is a state transition. Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments. The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time, compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm.

Key words： Path planning Q-learning Autonomous mobile robot Reinforcement learning

Received: 19 July 2012 Published: 05 March 2013

CLC:

TP242.6

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Xin Ma
	Ya Xu
	Guo-qiang Sun
	Li-xia Deng
	Yi-bin Li

Cite this article:

Xin Ma, Ya Xu, Guo-qiang Sun, Li-xia Deng, Yi-bin Li. State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots. Front. Inform. Technol. Electron. Eng., 2013, 14(3): 167-178.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1200226 OR http://www.zjujournals.com/xueshu/fitee/Y2013/V14/I3/167

State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots

This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments. As a computational approach to learning through interaction with the environment, reinforcement learning algorithms have been widely used for intelligent robot control, especially in the field of autonomous mobile robots. However, the learning process is slow and cumbersome. For practical applications, rapid rates of convergence are required. Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments. The state chain is built during the searching process. After one action is chosen and the reward is received, the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning. With the increasing number of Q-values updated after one action, the number of actual steps for convergence decreases and thus, the learning time decreases, where a step is a state transition. Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments. The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time, compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm.

关键词： Path planning, Q-learning, Autonomous mobile robot, Reinforcement learning

[1]	Feng-fei Zhao, Zheng Qin, Zhuo Shao, Jun Fang, Bo-yan Ren. Greedy feature replacement for online value function approximation[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(3): 223-231.

[2]	Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu. Convergence analysis of an incremental approach to online inverse reinforcement learning[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(1): 17-24.

[3]	Shen-yi Chen, Hui Qian, Jia Fan, Zhuo-jun Jin, Miao-liang Zhu. Modified reward function on abstract features in inverse reinforcement learning[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(9): 718-723.

Viewed

Full text

Abstract

Cited

Shared

Discussed