Please wait a minute...
J4  2011, Vol. 45 Issue (10): 1732-1737    DOI: 10.3785/j.issn.1008-973X.2011.10.006
自动化技术、信息技术     
基于倾向性分析的轨迹评测技术
金卓军, 钱徽, 朱淼良
浙江大学 计算机科学与技术学院, 浙江 杭州 310027
Trajectory evaluation method based on intention analysis
JIN Zhuo-jun, QIAN Hui, ZHU Miao-liang
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
 全文: PDF  HTML
摘要:

通过逆向增强学习和策略不变条件下的回报函数变形原理,研究已有标准轨迹前提下的车辆轨迹评测问题,提出基于倾向性分析的轨迹评测技术.对于标准轨迹和评测轨迹应用逆向增强学习算法,分别求出两者相对应的特征权重,将特征权重在策略不变条件下分别扩展成线性子空间,通过计算由正交投影矩阵定义的子空间间距离得到对评测轨迹的评测值.在四轮车辆仿真实验中,针对几种典型的驾驶风格轨迹对该方法进行验证.实验结果表明,该方法能够对于避障评测轨迹按其与标准轨迹的差异给出评测结果,克服了相同策略对应回报函数不唯一性所带来的影响,有效解决了车辆轨迹之间难于定量比较的难题.

Abstract:

The trajectory evaluation problem when a demonstration from an expert is available was investigated through inverse reinforcement learning and reward reshaping technique under policy invariance. A novel intention-based method was presented. The weights of the given trajectory and the demonstration were determined with respect to a fixed group of features. The linear subspaces spanned by these two weight vectors were computed by using the reward reshaping technique. The norm of the orthogonal projections was calculated and used to measure the difference between subspaces. In the four-wheel vehicle simulation experiment, the approach was tested by applying it to trajectories generated in several typical scenarios. Empirical results showed that, for the given trajectories, the approach can yield reasonable marks in finite steps according to the difference between the given trajectory and demonstration. The approach can eliminate the ambiguity brought by the inherent ill-posedness of inverse problems, and overcome the difficulties of trajectory evaluation.

出版日期: 2011-10-01
:  TP 181  
基金资助:

国家自然科学基金资助项目(90820306).

通讯作者: 钱徽,男,副教授.     E-mail: qianhui@zju.edu.cn
作者简介: 金卓军(1984—),男,博士生,从事机器学习的研究.E-mail: ariesjzj@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  

引用本文:

金卓军, 钱徽, 朱淼良. 基于倾向性分析的轨迹评测技术[J]. J4, 2011, 45(10): 1732-1737.

JIN Zhuo-jun, QIAN Hui, ZHU Miao-liang. Trajectory evaluation method based on intention analysis. J4, 2011, 45(10): 1732-1737.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2011.10.006        https://www.zjujournals.com/eng/CN/Y2011/V45/I10/1732

[1] GOYAT Y, CHATEAU T, MALATERRE L, et al. Vehicle trajectories evaluation by static video sensors [C]∥ Proceedings of the 9th International IEEE Conference on Intelligent Transportation Systems. Toronto, Canada: IEEE, 2006: 864-869.
[2] RUSSELL S. Learning agents for uncertain environments (extended abstract) [C]∥ Proceedings of the 11th Annual Conference on Computational Learning Theory. Madison, Wisconsin, USA: ACM, 1998: 101-103.
[3] NG A, RUSSELL S. Algorithms for inverse reinforcement learning [C]∥ Proceedings of the 17th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 2000: 663-670.
[4] ABBEEL P A, NG A. Apprenticeship learning via inverse reinforcement learning [C]∥ Proceedings of the 21st International Conference on Machine Learning. Alberta, Canada: ACM, 2004: 1-8.
[5] RATLIFF D N, BAGNELLM J A, ZINKEVICH M. Maximum margin planning [C]∥ Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, USA: Omnipress, 2006: 729-736.
[6] SYED U, BOWLING M, SCHAPIRE R E. Apprenticeship learning using linear programming [C]∥ Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland.: Omnipress, 2008: 1032-1039.
[7] RAMACHANDRAN D, AMIR E. Bayesian inverse reinforcement learning [C]∥ Proceedings of the 20th International Joint Conference on Artifical Intelligence. Hyderabad, India: AAAI Press, 2007: 2586-2591.
[8] ZIEBART B, MAAS A, BAGNELL J, et al. Maximum entropy inverse reinforcement learning [C]∥ Proceedings of the 23rd National Conference on Artificial Intelligence. Chicago, Illinois: AAAI Press, 2008: 1433-1438.
[9] NEU G C, SZEPESVARI C. Training parsers by inverse reinforcement learning [J]. Machine Learning, 2009, 77(2): 303-337.
[10] NG A, HARADAS D, RUSSELL S. Policy invariance under reward transformations: theory and application to reward shaping [C]∥ Proceedings of the 16th International Conference on Machine Learning. Bled, Slovenia: Morgan Kaufmann Publishers, 1999: 278-287.
[11] GOLUB G H, LOAN C F V. Matrix computations [M]. Baltimore: Johns Hopkins University Press, 1996.
[12] SUTTON R S, BARTO A G. Reinforcement learning: an introduction [M]. Cambridge: MIT, 1998.
[13] CHEN S Y, QIAN H, FAN J, et al. Modified reward function on abstract feature in inverse reinforcement learning [J]. Journal of Zhejiang University: Science C, 2010, 11(9): 718-723.

[1] 林亦宁, 韦巍, 戴渊明. 半监督Hough Forest跟踪算法[J]. J4, 2013, 47(6): 977-983.
[2] 李侃,黄文雄,黄忠华. 基于支持向量机的多传感器探测目标分类方法[J]. J4, 2013, 47(1): 15-22.
[3] 王洪波, 赵光宙, 齐冬莲, 卢达. 一类支持向量机的快速增量学习方法[J]. J4, 2012, 46(7): 1327-1332.
[4] 艾解清, 高济, 彭艳斌, 郑志军. 基于直推式支持向量机的协商决策模型[J]. J4, 2012, 46(6): 967-973.
[5] 潘俊, 孔繁胜, 王瑞琴. 局部敏感判别直推学习机[J]. J4, 2012, 46(6): 987-994.
[6] 顾弘, 赵光宙. 广义局部图像距离函数下的图像分类与识别[J]. J4, 2011, 45(4): 596-601.
[7] 罗建宏,陈德钊. 兼顾正确率和差异性的自适应集成算法及应用[J]. J4, 2011, 45(3): 557-562.
[8] 商秀芹, 卢建刚, 孙优贤. 基于遗传规划的铁矿烧结终点2级预测模型[J]. J4, 2010, 44(7): 1266-1269.
[9] 徐磊, 赵光宙, 顾弘. 成对耦合分类器的多球体预处理方法[J]. J4, 2010, 44(2): 237-242.