B. J. Bartlett-p, Direct gradient-based reinforcement learning, Proceedings of the International Symposium on Circuits and Systems, pp.271-274, 1999.

C. M. Kennedy, The particle swarm -explosion, stability, and convergence in a multidimensional complex space, IEEE Trans. Evol. Comp, vol.6, issue.1, pp.58-73, 2002.

G. M. Pietquin-o, Parametric Value Function Approximation : a Unified View, ADPRL 2011, pp.9-16, 2011.

H. Igel-c, Evolution Strategies for Direct Policy Search, Parallel Problem Solving from Nature (PPSN X), pp.428-437, 2008.

H. Igel-c, Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search, ACM International Conference Proceeding Series, p.51, 2009.

K. J. Eberhart-r, Particle swarm optimization, Proceedings IEEE International Joint Conference on Neural Networks, pp.1942-1948, 1995.

L. M. Parr-r, Least-Squares Policy Iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. S. Rubinstein-r and . Gat-y, The cross entropy method for fast policy search, International Conference on Machine Learning, p.512, 2003.

M. R. Moore, Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems, IJCAI, pp.1348-1355, 1999.

P. J. Schaal-s, Policy Gradient Methods for Robotics, IROS, pp.2219-2225, 2006.

S. R. Barto-a, Reinforcement Learning, 1998.
DOI : 10.1016/B978-012526430-3/50003-9