P. Abbeel and A. Ng, Apprenticeship learning via inverse reinforcement learning, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015430

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with bellmanresidual minimization based fitted policy iteration and a single sample path, Machine Learning, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00830201

T. Archibald, K. Mckinnon, and L. Thomas, On the generation of markov decision processes, Journal of the Operational Research Society, 1995.

N. Aronszajn, Theory of reproducing kernels, Transactions of the American Mathematical Society, vol.68, issue.3, 1950.
DOI : 10.1090/S0002-9947-1950-0051437-7

D. Bertsekas, Dynamic programming and optimal control, Athena Scientific, vol.1, 1995.

S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, 1996.

L. Breiman, Classification and regression trees, 1993.

F. Clarke, Generalized gradients and applications. Transactions of the, 1975.
DOI : 10.1090/s0002-9947-1975-0367131-6

A. Farahmand, R. Munos, and C. Szepesvári, Error propagation for approximate policy and value iteration, Proc. of NIPS, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830154

A. Grubb and J. Bagnell, Generalized boosting algorithms for convex optimization, Proc. of ICML, 2011.

K. Judah, A. Fern, and T. Dietterich, Active imitation learning via reduction to iid active learning, Proc. of UAI, 2012.

B. Kim, A. Farahmand, J. Pineau, and D. Precup, Learning from limited demonstrations, Proc. of NIPS, 2013.

E. Klein, M. Geist, B. Piot, and O. Pietquin, Inverse reinforcement learning through structured classification, Proc. of NIPS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00778624

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, 2003.

G. Lever, L. Baldassarre, A. Gretton, M. Pontil, and S. Grünewälder, Modelling transition dynamics in mdps with rkhs embeddings, Proc. of ICML, 2012.

R. Munos, Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, 2007.
DOI : 10.1137/040614384

B. Piot, M. Geist, and O. Pietquin, Learning from Demonstrations: Is It Worth Estimating a Reward Function?, Proc. of ECML, 2013.
DOI : 10.1007/978-3-642-40988-2_2

URL : https://hal.archives-ouvertes.fr/hal-00916938

M. Puterman, Markov decision processes: Discrete stochastic dynamic programming, 1994.
DOI : 10.1002/9780470316887

N. Ratliff, J. Bagnell, and S. Srinivasa, Imitation learning for locomotion and manipulation, 2007 7th IEEE-RAS International Conference on Humanoid Robots, 2007.
DOI : 10.1109/ICHR.2007.4813899

N. Ratliff, J. Bagnell, and M. Zinkevich, Maximum margin planning, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143936

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Ross, G. Gordon, and J. Bagnell, A reduction of imitation learning and structured prediction to no-regret online learning, Proc. of AISTATS, 2011.

N. Shor, K. Kiwiel, and A. Ruszcaynski, Minimization methods for non-differentiable functions, 1985.
DOI : 10.1007/978-3-642-82118-9

B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. Lanckriet, Hilbert space embeddings and metrics on probability measures, The Journal of Machine Learning Research, 2010.

U. Syed, M. Bowling, and R. Schapire, Apprenticeship learning using linear programming, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390286

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

B. Yu, Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 1994.