Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2011, Vol. 12 Issue (1): 17-24    DOI: 10.1631/jzus.C1010010
    
Convergence analysis of an incremental approach to online inverse reinforcement learning
Zhuo-jun Jin, Hui Qian*, Shen-yi Chen, Miao-liang Zhu
School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Download:   PDF(244KB)
Export: BibTeX | EndNote (RIS)      

Abstract  Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.

Key wordsIncremental approach      Reward recovering      Online learning      Inverse reinforcement learning      Markov decision process     
Received: 09 January 2010      Published: 10 January 2010
CLC:  TP181  
Fund:  Project (No. 90820306) supported by the National Natural Science Foundation of China
Cite this article:

Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu. Convergence analysis of an incremental approach to online inverse reinforcement learning. Front. Inform. Technol. Electron. Eng., 2011, 12(1): 17-24.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1010010     OR     http://www.zjujournals.com/xueshu/fitee/Y2011/V12/I1/17


Convergence analysis of an incremental approach to online inverse reinforcement learning

Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.

关键词: Incremental approach,  Reward recovering,  Online learning,  Inverse reinforcement learning,  Markov decision process 
[1] Xi-ming Li, Ji-hong Ouyang, You Lu. Topic modeling for large-scale text data[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 457-465.
[2] Shen-yi Chen, Hui Qian, Jia Fan, Zhuo-jun Jin, Miao-liang Zhu. Modified reward function on abstract features in inverse reinforcement learning[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(9): 718-723.