Revisiting Natural Actor-Critics with Value Function Approximation

Abstract : Actor-critics architectures have become popular during the last decade in the field of reinforcement learning because of the introduction of the policy gradient with function approximation theorem. It allows combining rationally actor-critic architectures with value function approximation and therefore addressing large-scale problems. Recent researches led to the replacement of policy gradient by a natural policy gradient, improving the efficiency of the corresponding algorithms. However, a common drawback of these approaches is that they require the manipulation of the so-called advantage function which does not satisfy any Bellman equation. Consequently, derivation of actor-critic algorithms is not straightforward. In this paper, we re-derive theorems in a way that allows reasoning directly with the state-action value function (or Q-function) and thus relying on the Bellman equation again. Consequently, new forms of critics can easily be integrated in the actor-critic framework.
Document type :
Conference papers
Complete list of metadatas

https://hal-supelec.archives-ouvertes.fr/hal-00553870
Contributor : Sébastien van Luchene <>
Submitted on : Monday, January 10, 2011 - 10:03:52 AM
Last modification on : Thursday, March 29, 2018 - 11:06:04 AM

Links full text

Identifiers

Collections

Citation

Matthieu Geist, Olivier Pietquin. Revisiting Natural Actor-Critics with Value Function Approximation. MDAI 2010, Oct 2010, Perpignan, France. pp.207-218, ⟨10.1007/978-3-642-16292-3_21⟩. ⟨hal-00553870⟩

Share

Metrics

Record views

155