Please wait a minute...
J4  2011, Vol. 45 Issue (10): 1732-1737    DOI: 10.3785/j.issn.1008-973X.2011.10.006
    
Trajectory evaluation method based on intention analysis
JIN Zhuo-jun, QIAN Hui, ZHU Miao-liang
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The trajectory evaluation problem when a demonstration from an expert is available was investigated through inverse reinforcement learning and reward reshaping technique under policy invariance. A novel intention-based method was presented. The weights of the given trajectory and the demonstration were determined with respect to a fixed group of features. The linear subspaces spanned by these two weight vectors were computed by using the reward reshaping technique. The norm of the orthogonal projections was calculated and used to measure the difference between subspaces. In the four-wheel vehicle simulation experiment, the approach was tested by applying it to trajectories generated in several typical scenarios. Empirical results showed that, for the given trajectories, the approach can yield reasonable marks in finite steps according to the difference between the given trajectory and demonstration. The approach can eliminate the ambiguity brought by the inherent ill-posedness of inverse problems, and overcome the difficulties of trajectory evaluation.



Published: 01 October 2011
CLC:  TP 181  
Cite this article:

JIN Zhuo-jun, QIAN Hui, ZHU Miao-liang. Trajectory evaluation method based on intention analysis. J4, 2011, 45(10): 1732-1737.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2011.10.006     OR     https://www.zjujournals.com/eng/Y2011/V45/I10/1732


基于倾向性分析的轨迹评测技术

通过逆向增强学习和策略不变条件下的回报函数变形原理,研究已有标准轨迹前提下的车辆轨迹评测问题,提出基于倾向性分析的轨迹评测技术.对于标准轨迹和评测轨迹应用逆向增强学习算法,分别求出两者相对应的特征权重,将特征权重在策略不变条件下分别扩展成线性子空间,通过计算由正交投影矩阵定义的子空间间距离得到对评测轨迹的评测值.在四轮车辆仿真实验中,针对几种典型的驾驶风格轨迹对该方法进行验证.实验结果表明,该方法能够对于避障评测轨迹按其与标准轨迹的差异给出评测结果,克服了相同策略对应回报函数不唯一性所带来的影响,有效解决了车辆轨迹之间难于定量比较的难题.

[1] GOYAT Y, CHATEAU T, MALATERRE L, et al. Vehicle trajectories evaluation by static video sensors [C]∥ Proceedings of the 9th International IEEE Conference on Intelligent Transportation Systems. Toronto, Canada: IEEE, 2006: 864-869.
[2] RUSSELL S. Learning agents for uncertain environments (extended abstract) [C]∥ Proceedings of the 11th Annual Conference on Computational Learning Theory. Madison, Wisconsin, USA: ACM, 1998: 101-103.
[3] NG A, RUSSELL S. Algorithms for inverse reinforcement learning [C]∥ Proceedings of the 17th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 2000: 663-670.
[4] ABBEEL P A, NG A. Apprenticeship learning via inverse reinforcement learning [C]∥ Proceedings of the 21st International Conference on Machine Learning. Alberta, Canada: ACM, 2004: 1-8.
[5] RATLIFF D N, BAGNELLM J A, ZINKEVICH M. Maximum margin planning [C]∥ Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, USA: Omnipress, 2006: 729-736.
[6] SYED U, BOWLING M, SCHAPIRE R E. Apprenticeship learning using linear programming [C]∥ Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland.: Omnipress, 2008: 1032-1039.
[7] RAMACHANDRAN D, AMIR E. Bayesian inverse reinforcement learning [C]∥ Proceedings of the 20th International Joint Conference on Artifical Intelligence. Hyderabad, India: AAAI Press, 2007: 2586-2591.
[8] ZIEBART B, MAAS A, BAGNELL J, et al. Maximum entropy inverse reinforcement learning [C]∥ Proceedings of the 23rd National Conference on Artificial Intelligence. Chicago, Illinois: AAAI Press, 2008: 1433-1438.
[9] NEU G C, SZEPESVARI C. Training parsers by inverse reinforcement learning [J]. Machine Learning, 2009, 77(2): 303-337.
[10] NG A, HARADAS D, RUSSELL S. Policy invariance under reward transformations: theory and application to reward shaping [C]∥ Proceedings of the 16th International Conference on Machine Learning. Bled, Slovenia: Morgan Kaufmann Publishers, 1999: 278-287.
[11] GOLUB G H, LOAN C F V. Matrix computations [M]. Baltimore: Johns Hopkins University Press, 1996.
[12] SUTTON R S, BARTO A G. Reinforcement learning: an introduction [M]. Cambridge: MIT, 1998.
[13] CHEN S Y, QIAN H, FAN J, et al. Modified reward function on abstract feature in inverse reinforcement learning [J]. Journal of Zhejiang University: Science C, 2010, 11(9): 718-723.

[1] LIN Yi-ning, WEI Wei, DAI Yuan-ming. Semi-supervised Hough Forest tracking method[J]. J4, 2013, 47(6): 977-983.
[2] LI Kan, HUANG Wen-xiong, HUANG Zhong-hua. Multi-sensor detected object classification method based on
support vector machine
[J]. J4, 2013, 47(1): 15-22.
[3] WANG Hong-bo, ZHAO Guang-zhou, QI Dong-lian, LU Da. Fast incremental learning method for one-class support vector machine[J]. J4, 2012, 46(7): 1327-1332.
[4] AI Jie-qing, GAO Ji, PENG Yan-bin, ZHENG Zhi-jun. Negotiation decision model based on transductive
support vector machine
[J]. J4, 2012, 46(6): 967-973.
[5] PAN Jun, KONG Fan-sheng, WANG Rui-qin. Locality sensitive discriminant transductive learning[J]. J4, 2012, 46(6): 987-994.
[6] GU Hong, ZHAO Guang-zhou. Image retrieval and recognition based on generalized
local distance functions
[J]. J4, 2011, 45(4): 596-601.
[7] LUO Jian-hong, CHEN De-zhao. Application of adaptive ensemble algorithm based on
correctness and diversity
[J]. J4, 2011, 45(3): 557-562.
[8] SHANG Xiu-Qin, LEI Jian-Gang, SUN You-Xian. Genetic programming based twoterm prediction model of iron ore burning through point[J]. J4, 2010, 44(7): 1266-1269.
[9] XU Lei, DIAO Guang-Zhou, GU Hong. Preprocess method of pairwise coupling based on multi-spheres[J]. J4, 2010, 44(2): 237-242.