Residual-Conditioned Policy Iteration for Markov Games and Robust Markov Decision Processes
Wednesday 14 January 2026, 1:00pm to 2:00pm
Venue
MAN - Mngt School Robinson LT16 WPA019 - View MapOpen to
Postgraduates, Public, StaffRegistration
Registration not required - just turn upEvent Details
Jefferson Huang of the Naval Postgraduate School, California will present a seminar to the Management Science Department
Abstract: A Markov game (MG) can be viewed as a Markov decision process with multiple players who jointly determine the one-step rewards and transition probabilities. This talk considers two-player zero-sum MGs, which are closely related to robust MDPs. For such MGs, a generalization of policy iteration due to Pollatschek and Avi-Itzhak (1969) often performs well, but may diverge (van der Wal, 1978). Filar and Tolwinski (1991) proposed a modification to this algorithm that is equivalent to applying Newton’s method with Armijo’s rule to approximate the root of a suitably defined functional. We show via a simple example that this modification does not guarantee convergence either. We then present a provably convergent algorithm called residual-conditional policy iteration that retains the desirable empirical performance of Pollatschek and Avi-Itzhak’s original algorithm.
Speaker
Jefferson Huang
Naval Postgraduate School, California
Contact Details
| Name | Gay Bentinck |