Lidé

Ing. Michal Šustr, Ph.D.

Všechny publikace

Learning not to regret

  • Autoři: Sychrovský, D., Ing. Michal Šustr, Ph.D., Davoodi, E., Bowling, M., Lanctot, M., Schmid, M.
  • Publikace: Proceedings of the 38th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2024. p. 15202-15210. AAAI Conference on Artificial Intelligence. vol. 38. ISSN 2374-3468. ISBN 978-1-57735-887-9.
  • Rok: 2024
  • DOI: 10.1609/aaai.v38i14.29443
  • Odkaz: https://doi.org/10.1609/aaai.v38i14.29443
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    The literature on game-theoretic equilibrium finding predominantly focuses on single games or their repeated play. Nevertheless, numerous real-world scenarios feature playing a game sampled from a distribution of similar, but not identical games, such as playing poker with different public cards or trading correlated assets on the stock market. As these similar games feature similar equilibra, we investigate a way to accelerate equilibrium finding on such a distribution. We present a novel "learning not to regret" framework, enabling us to meta-learn a regret minimizer tailored to a specific distribution. Our key contribution, Neural Predictive Regret Matching, is uniquely meta-learned to converge rapidly for the chosen distribution of games, while having regret minimization guarantees on any game. We validated our algorithms' faster convergence on a distribution of river poker games. Our experiments show that the meta-learned algorithms outpace their non-meta-learned counterparts, achieving more than tenfold improvements.

Sound Algorithms in Imperfect Information Games

  • Autoři: Ing. Michal Šustr, Ph.D., Schmid, M., Moravčík, M., Burch, N., Lanctot, M., Bowling, M.
  • Publikace: AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual Event, United Kingdom, May 3-7, 2021. New York: ACM, 2021. p. 1662-1664. ISSN 1548-8403. ISBN 978-1-7138-3262-1.
  • Rok: 2021
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Search has played a fundamental role in computer game research since the very beginning. And while online search has been commonly used in perfect information games such as Chess and Go, online search methods for imperfect information games have only been introduced relatively recently. This paper addresses the question of what is a sound online algorithm in an imperfect information setting of two-player zero-sum games? We argue that the fixedstrategy definitions of exploitability and epsilon-Nash equilibria are ill suited to measure the worst-case performance of an online algorithm. We thus formalize epsilon-soundness, a concept that connects the worst-case performance of an online algorithm to the performance of an epsilon-Nash equilibrium. Our definition of soundness and the consistency hierarchy finally provide appropriate tools to analyze online algorithms in repeated imperfect information games. We thus inspect some of the previous online algorithms in a new light, bringing new insights into their worst case performance guarantees.

Monte Carlo Continual Resolving for Online Strategy Computation in Imperfect Information Games

  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Online game playing algorithms produce high-quality strategieswith a fraction of memory and computation required by their of-fline alternatives. Continual Resolving (CR) is a recent theoreti-cally sound approach to online game playing that has been usedto outperform human professionals in poker. However, parts ofthe algorithm were specific to poker, which enjoys many proper-ties not shared by other imperfect information games. We presenta domain-independent formulation of CR applicable to any two-player zero-sum extensive-form games (EFGs). It works with anabstract resolving algorithm, which can be instantiated by variousEFG solvers. We further describe and implement its Monte Carlovariant (MCCR) which uses Monte Carlo Counterfactual RegretMinimization (MCCFR) as a resolver. We prove the correctness ofCR and show anO(T−1/2)-dependence of MCCR’s exploitabilityon the computation time. Furthermore, we present an empiricalcomparison of MCCR with incremental tree building to OnlineOutcome Sampling and Information-set MCTS on several domains.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk