E. De-? and . Est-centrale-dans-notre-approche, Nous disposons d'un algorithme efficace avec LSTDµ, mais qui présente cependant les mêmes inconvénients que les algorithmes d'estimation d'une fonction de valeur. L'axe d'étude que nous envisageons d'explorer serait de ne plus passer par l'attribut vectoriel moyen en trouvant le moyen d'introduire autrement la structure du PDM dans la démarche de classification que nous avons adoptée, Des tests empiriques sur des problèmes plus complexes sont également envisagés

A. P. Ng-a, Apprenticeship learning via inverse reinforce- CAp, 2004.

B. S. Barto-a, Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.

C. R. Lemaréchal-c, Convergence of some algorithms for convex minimization, Mathematical Programming, pp.261-275, 1993.

K. E. and G. M. Pietquin-o, Batch, Off-policy and Model- Free Apprenticeship Learning, Proc. EWRL 2011), 2011.

K. J. and A. P. Ng-a, Hierarchical apprenticeship learning with application to quadruped locomotion, Proc. NIPS, 2008.

L. M. Parr-r, Least-squares policy iteration, The Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. F. Lopes-m, Learning from demonstration using mdp induced metrics, Machine Learning and Knowledge Discovery in Databases, pp.385-401, 2010.

N. G. Szepesvári-c, Training parsers by inverse reinforcement learning, Machine learning, vol.77, issue.2, pp.303-337, 2009.

N. A. Harada-d and . Russell-s, Policy invariance under reward transformations : Theory and application to reward shaping, Proc. ICML, pp.278-287, 1999.

N. A. Russell-s, Algorithms for inverse reinforcement learning, Proc. ICML, pp.663-670, 2000.

R. N. and B. J. Srinivasa-s, Imitation learning for locomotion and manipulation, International Conference on Humanoid Robots, pp.392-397, 2007.

R. N. and B. J. Zinkevich-m, Maximum margin planning, Proc. ICML, p.736, 2006.

R. N. , B. D. , and B. J. Chestnutt-j, Boosting structured prediction for imitation learning, Proc. NIPS, vol.19, p.1153, 2007.

S. U. Bowling-m and . Schapire-r, Apprenticeship learning using linear programming, Proc. ICML, pp.1032-1039, 2008.

S. U. Schapire-r, A game-theoretic approach to apprenticeship learning, Proc. NIPS, pp.1449-1456, 2008.

Z. B. Maas-a and B. J. Dey-a, Maximum entropy inverse reinforcement learning, Proc. AAAI, pp.1433-1438, 2008.