Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2014, Vol. 15 Issue (3): 223-231    DOI: 10.1631/jzus.C1300246
    
用于在线值函数近似的贪婪特征替换方法
Feng-fei Zhao, Zheng Qin, Zhuo Shao, Jun Fang, Bo-yan Ren
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; School of Software, Tsinghua University, Beijing 100084, China; Department of Physics and State Key Laboratory of Low-Dimensional Quantum Physics, Tsinghua University, Beijing 100084, China
Greedy feature replacement for online value function approximation
Feng-fei Zhao, Zheng Qin, Zhuo Shao, Jun Fang, Bo-yan Ren
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; School of Software, Tsinghua University, Beijing 100084, China; Department of Physics and State Key Laboratory of Low-Dimensional Quantum Physics, Tsinghua University, Beijing 100084, China
 全文: PDF 
摘要: 研究目的:强化学习需要使用值函数近似来处理真实世界中的高维度状态空间问题,而值函数近似的效果很大程度上取决于合适特征表示的选取。表示扩展技术可使线性近似器更有效地表示值函数,然而现有的表示扩展技术或者只适用于低维度问题,或者在学习过程中不能及时发现重要的特征依赖。因此,本文提出了一种新型在线特征扩展技术——贪婪特征替换(GFR)。
创新要点:通过发现替换区域来执行特征替换的方式,贪婪特征替换方法增量地扩展特征表示,可以处理高维度状态空间问题,解决了传统在线扩展技术的局限。通过识别替换区域,贪婪特征替换方法优化了特征组合的加入顺序,提高了特征表示扩展的性能。
研究方法:引入虚拟时序差的概念,模拟新特征组合加入特征表示后的时序差,通过分析实际时序差与虚拟时序差累积情况的差距,选择新的特征组合。如差距超过给定阈值,这个特征组合将被作为新特征。在每次特征替换中,两个原有的特征被一个新的特征组合取代。
重要结论:理论分析表明,使用贪婪特征替换方法的值函数近似可以渐进地得到最优近似结果。对两个有代表性的学习问题的实验结果显示,贪婪特征替换可以达到更快的学习效果,而且具备处理大规模状态空间的能力。贪婪特征替换方法可以应用于各种在线值函数近似。
关键词: 强化学习值函数近似特征依赖在线扩展特征替换    
Abstract: Reinforcement learning (RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement (GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference (TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.
Key words: Reinforcement learning    Function approximation    Feature dependency    Online expansion    Feature replacement
收稿日期: 2013-09-06 出版日期: 2014-03-05
CLC:  TP181  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Feng-fei Zhao
Zheng Qin
Zhuo Shao
Jun Fang
Bo-yan Ren

引用本文:

Feng-fei Zhao, Zheng Qin, Zhuo Shao, Jun Fang, Bo-yan Ren. Greedy feature replacement for online value function approximation. Front. Inform. Technol. Electron. Eng., 2014, 15(3): 223-231.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1300246        http://www.zjujournals.com/xueshu/fitee/CN/Y2014/V15/I3/223

[1] Jian-ru Xue, Di Wang, Shao-yi Du, Di-xiao Cui, Yong Huang, Nan-ning Zheng. 无人车自主定位和障碍物感知的视觉主导多传感器融合方法[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(1): 122-138.
[2] Tao-cheng Hu, Jin-hui Yu. 基于最大间隔的贝叶斯分类器[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(10): 973-981.
[3] Izabela Nielsen, Robert Wójcik, Grzegorz Bocewicz, Zbigniew Banaszak. 模糊操作时间约束下的多模过程优化:声明式建模方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(4): 338-347.
[4] Jo?o Carneiro, Diogo Martinho, Goreti Marreiros, Paulo Novais. 应用于普适群体决策的智能谈判模型[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(4): 296-308.
[5] Ya-tao Zhang, Cheng-yu Liu, Shou-shui Wei, Chang-zhi Wei, Fei-fei Liu. 基于非线性支持向量机和遗传算法的移动ECG质量评估[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(7): 564-573.
[6] Hong-xia Pang, Wen-de Dong, Zhi-hai Xu, Hua-jun Feng, Qi Li, Yue-ting Chen. Novel linear search for support vector machine parameter selection[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(11): 885-896.
[7] Zhi-yong Yan, Cong-fu Xu, Yun-he Pan. [J]. Frontiers of Information Technology & Electronic Engineering, 2011, 12(8): 647-657.
[8] Peng Chen, Yong-zai Lu. Extremal optimization for optimizing kernel function and its parameters in support vector regression[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(4): 297-306.
[9] Zhuo-jun Jin, Hui Qian, Shen-yi Chen, Miao-liang Zhu. Convergence analysis of an incremental approach to online inverse reinforcement learning[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(1): 17-24.
[10] Shen-yi Chen, Hui Qian, Jia Fan, Zhuo-jun Jin, Miao-liang Zhu. Modified reward function on abstract features in inverse reinforcement learning[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(9): 718-723.