Skip to Main content Skip to Navigation

Zero-sum repeated games : accelerated algorithms and tropical best-approximation

Abstract : In this thesis, we develop accelerated algorithms for Markov decision processes (MDP) and more generally for zero-sum stochastic games (SG). We also address best approximation problems arising in tropical geometry.Dynamic programming is one of the main approaches used to solve MDP and SG problems. It allows one to transform a game to a fixed point problem involving an operator called Shapley operator (or Bellman operator in the case of MDP). Value iteration (VI) and policy iteration (PI) are the two main algorithms allowing one to solve these fixed point problems. However, in the case of large scale instances, or when we want to solve a mean payoff problem (where there is no discount factor for the payments received in the future), classical methods become slow.In the first part of this thesis, we develop two refinements of the classical value or policy iteration algorithms. We first propose an accelerated version of value iteration (AVI) allowing to solve affine fixed point problems with non self-adjoint matrices, alongside with an accelerated version of policy iteration (API) for MDP, building on AVI. This acceleration extends Nesterov accelerated gradient algorithm to a class of fixed point problems -- which cannot be interpreted in terms of convex programming. We characterize the spectra of matrices for which this algorithm does converge with an accelerated asymptotic rate. We also introduce an accelerated algorithm of degree d, and show that it yields a multiply accelerated rate of convergence under more demanding conditions on the spectrum of the matrices. Another contribution is a deflated version of value iteration (DVI) to solve the mean payoff version of stochastic games. This method allows one to transform a mean payoff problem to a discounted one under the hypothesis of existence of a distinguished state that is accessible from all other states and under all policies. Combining this deflation method with variance reduction techniques, we derive a sublinear algorithm solving mean payoff stochastic games.In the second part of this thesis, we study tropical best approximation problems. We first solve a tropical linear regression problem consisting in finding the best approximation of a set of points by a tropical hyperplane. We show that the value of this regression problem coincides with the maximal radius of a Hilbert's ball included in a tropical polyhedron, and that this problem is polynomial-time equivalent to mean payoff games. We apply these results to an inverse problem from auction theory. We study also a tropical analogue of low-rank approximation for matrices. This is motivated by approximate methods in dynamic programming, in which the value function is approximated by a supremum of elementary functions. We establish general properties of tropical low-rank approximation, and identify classes of low-rank approximation problems that are polynomial-time solvable. In particular, we show that the best tropical rank-one matrix approximation is equivalent to finding the minimal radius of a Hilbert's ball containing a tropical polyhedron.
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Monday, May 9, 2022 - 12:51:20 PM
Last modification on : Tuesday, May 10, 2022 - 3:47:01 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03662467, version 1



Omar Saadi. Zero-sum repeated games : accelerated algorithms and tropical best-approximation. General Mathematics [math.GM]. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAX102⟩. ⟨tel-03662467⟩



Record views


Files downloads