The trajectory evaluation problem when a demonstration from an expert is available was investigated through inverse reinforcement learning and reward reshaping technique under policy invariance. A novel intention-based method was presented. The weights of the given trajectory and the demonstration were determined with respect to a fixed group of features. The linear subspaces spanned by these two weight vectors were computed by using the reward reshaping technique. The norm of the orthogonal projections was calculated and used to measure the difference between subspaces. In the four-wheel vehicle simulation experiment, the approach was tested by applying it to trajectories generated in several typical scenarios. Empirical results showed that, for the given trajectories, the approach can yield reasonable marks in finite steps according to the difference between the given trajectory and demonstration. The approach can eliminate the ambiguity brought by the inherent ill-posedness of inverse problems, and overcome the difficulties of trajectory evaluation.
[1] GOYAT Y, CHATEAU T, MALATERRE L, et al. Vehicle trajectories evaluation by static video sensors [C]∥ Proceedings of the 9th International IEEE Conference on Intelligent Transportation Systems. Toronto, Canada: IEEE, 2006: 864-869. [2] RUSSELL S. Learning agents for uncertain environments (extended abstract) [C]∥ Proceedings of the 11th Annual Conference on Computational Learning Theory. Madison, Wisconsin, USA: ACM, 1998: 101-103. [3] NG A, RUSSELL S. Algorithms for inverse reinforcement learning [C]∥ Proceedings of the 17th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 2000: 663-670. [4] ABBEEL P A, NG A. Apprenticeship learning via inverse reinforcement learning [C]∥ Proceedings of the 21st International Conference on Machine Learning. Alberta, Canada: ACM, 2004: 1-8. [5] RATLIFF D N, BAGNELLM J A, ZINKEVICH M. Maximum margin planning [C]∥ Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, USA: Omnipress, 2006: 729-736. [6] SYED U, BOWLING M, SCHAPIRE R E. Apprenticeship learning using linear programming [C]∥ Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland.: Omnipress, 2008: 1032-1039. [7] RAMACHANDRAN D, AMIR E. Bayesian inverse reinforcement learning [C]∥ Proceedings of the 20th International Joint Conference on Artifical Intelligence. Hyderabad, India: AAAI Press, 2007: 2586-2591. [8] ZIEBART B, MAAS A, BAGNELL J, et al. Maximum entropy inverse reinforcement learning [C]∥ Proceedings of the 23rd National Conference on Artificial Intelligence. Chicago, Illinois: AAAI Press, 2008: 1433-1438. [9] NEU G C, SZEPESVARI C. Training parsers by inverse reinforcement learning [J]. Machine Learning, 2009, 77(2): 303-337. [10] NG A, HARADAS D, RUSSELL S. Policy invariance under reward transformations: theory and application to reward shaping [C]∥ Proceedings of the 16th International Conference on Machine Learning. Bled, Slovenia: Morgan Kaufmann Publishers, 1999: 278-287. [11] GOLUB G H, LOAN C F V. Matrix computations [M]. Baltimore: Johns Hopkins University Press, 1996. [12] SUTTON R S, BARTO A G. Reinforcement learning: an introduction [M]. Cambridge: MIT, 1998. [13] CHEN S Y, QIAN H, FAN J, et al. Modified reward function on abstract feature in inverse reinforcement learning [J]. Journal of Zhejiang University: Science C, 2010, 11(9): 718-723.