Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2011, Vol. 12 Issue (1): 17-24    DOI: 10.1631/jzus.C1010010
    
Convergence analysis of an incremental approach to online inverse reinforcement learning
Zhuo-jun Jin, Hui Qian*, Shen-yi Chen, Miao-liang Zhu
School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Convergence analysis of an incremental approach to online inverse reinforcement learning
Zhuo-jun Jin, Hui Qian*, Shen-yi Chen, Miao-liang Zhu
School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
 全文: PDF(244 KB)  
摘要: Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.
关键词: Incremental approachReward recoveringOnline learningInverse reinforcement learningMarkov decision process    
Abstract: Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.
Key words: Incremental approach    Reward recovering    Online learning    Inverse reinforcement learning    Markov decision process
收稿日期: 2010-01-09 出版日期: 2010-01-10
CLC:  TP181  
基金资助: Project (No. 90820306) supported by the National Natural Science Foundation of China
通讯作者: Hui Qian     E-mail: qianhui@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Zhuo-jun Jin
Hui Qian
Shen-yi Chen
Miao-liang Zhu

引用本文:

Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu. Convergence analysis of an incremental approach to online inverse reinforcement learning. Front. Inform. Technol. Electron. Eng., 2011, 12(1): 17-24.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1010010        http://www.zjujournals.com/xueshu/fitee/CN/Y2011/V12/I1/17

[1] Shen-yi Chen, Hui Qian, Jia Fan, Zhuo-jun Jin, Miao-liang Zhu. Modified reward function on abstract features in inverse reinforcement learning[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(9): 718-723.