Lidé

Ing. Karel Horák, Ph.D.

Všechny publikace

Solving Partially Observable Stochastic Shortest-Path Games

  • DOI: 10.24963/ijcai.2021/575
  • Odkaz: https://doi.org/10.24963/ijcai.2021/575
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    We study the two-player zero-sum extension of the partially observable stochastic shortest-path problem where one agent has only partial information about the environment. We formulate this problem as a partially observable stochastic game (POSG): given a set of target states and negative rewards for each transition, the player with imperfect information maximizes the expected undiscounted total reward until a target state is reached. The second player with the perfect information aims for the opposite. We base our formalism on POSGs with one-sided observability (OS-POSGs) and give the following contributions: (1) we introduce a novel heuristic search value iteration algorithm that iteratively solves depth-limited variants of the game, (2) we derive the bound on the depth guaranteeing an arbitrary precision, (3) we propose a novel upper-bound estimation that allows early terminations, and (4) we experimentally evaluate the algorithm on a pursuit-evasion game.

Compact Representation of Value Function in Partially Observable Stochastic Games

  • Autoři: Ing. Karel Horák, Ph.D., doc. Mgr. Branislav Bošanský, Ph.D., Kiekintveld, Ch., Kamhoua, Ch.
  • Publikace: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2019. p. 350-356. ISSN 1045-0823. ISBN 978-0-9992411-4-1.
  • Rok: 2019
  • DOI: 10.24963/ijcai.2019/50
  • Odkaz: https://doi.org/10.24963/ijcai.2019/50
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Value methods for solving stochastic games with partial observability model the uncertainty of the players as a probability distribution over possible states, where the dimension of the belief space is the number of states. For many practical problems, there are exponentially many states which causes scalability problems. We propose an abstraction technique that addresses this curse of dimensionality by projecting the high-dimensional beliefs onto characteristic vectors of significantly lower dimension (e.g., marginal probabilities). Our main contributions are (1) a novel compact representation of the uncertainty in partially observable stochastic games and (2) a novel algorithm using this representation that is based on existing state-of-the-art algorithms for solving stochastic games with partial observability. Experimental evaluation confirms that the new algorithm using the compact representation dramatically increases scalability compared to the state of the art.

Optimizing Honeypot Strategies Against Dynamic Lateral Movement Using Partially Observable Stochastic Games

  • DOI: 10.1016/j.cose.2019.101579
  • Odkaz: https://doi.org/10.1016/j.cose.2019.101579
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Partially observable stochastic games (POSGs) are a general game-theoretic model for capturing dynamic interactions where players have partial information. The existing algorithms for solving subclasses of POSGs have theoretical guarantees for converging to approximate optimal strategies, however, their scalability is limited and they cannot be directly used to solve games of realistic sizes. In our problem, the attacker uses lateral movement through the network in order to reach a specific host, while the defender wants to discover the attacker by dynamically reallocating honeypots. We demonstrate that restricting to a specific domain allows us to substantially improve existing algorithms: (1) we formulate a compact representation of uncertainty the defender faces, (2) we exploit the incremental strategy-generation method that over iterations expands the possible actions for players. The experimental evaluation shows that our novel algorithms scale several orders of magnitude better compared to the existing state of the art.

Solving Partially Observable Stochastic Games with Public Observations

  • DOI: 10.1609/aaai.v33i01.33012029
  • Odkaz: https://doi.org/10.1609/aaai.v33i01.33012029
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    n many real-world problems, there is a dynamic interactionbetween competitive agents. Partially observable stochasticgames (POSGs) are among the most general formal mod-els that capture such dynamic scenarios. The model capturesstochastic events, partial information of players about the en-vironment, and the scenario does not have a fixed horizon.Solving POSGs in the most general setting is intractable.Therefore, the research has been focused on subclasses ofPOSGs that have a value of the game and admit designing(approximate) optimal algorithms. We propose such a sub-class for two-player zero-sum games with discounted-sumobjective function—POSGs withpublic observations(PO-POSGs)—where each player is able to reconstruct beliefs ofthe other player over the unobserved states. Our results in-clude: (1) theoretical analysis of PO-POSGs and their valuefunctions showing convexity (concavity) in beliefs of maxi-mizing (minimizing) player, (2) a novel algorithm for approx-imating the value of the game, and (3) a practical demon-stration of scalability of our algorithm. Experimental resultsshow that our algorithm can closely approximate the value ofnon-trivial games with hundreds of states

Approximating Maxmin Strategies in Imperfect Recall Games Using A-loss Recall Property

  • DOI: 10.1016/j.ijar.2017.11.010
  • Odkaz: https://doi.org/10.1016/j.ijar.2017.11.010
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Extensive-form games with imperfect recall are an important model of dynamic games where the players are allowed to forget previously known information. Often, imperfect recall games result from an abstraction algorithm that simplifies a large game with perfect recall. Solving imperfect recall games is known to be a hard problem, and thus it is useful to search for a subclass of imperfect recall games which offers sufficient memory savings while being efficiently solvable. The abstraction process can then be guided to result in a game from this class. We focus on a subclass of imperfect recall games called A-loss recall games. First, we provide a complete picture of the complexity of solving imperfect recall and A-loss recall games. We show that the A-loss recall property allows us to compute a best response in polynomial time (computing a best response is NP-hard in imperfect recall games). This allows us to create a practical algorithm for approximating maxmin strategies in two-player games where the maximizing player has imperfect recall and the minimizing player has A-loss recall. This algorithm is capable of solving some games with up to 5⋅109 states in approximately 1 hour. Finally, we demonstrate that the use of imperfect recall abstraction can reduce the size of the strategy representation to as low as 0.03% of the size of the strategy representation in the original perfect recall game without sacrificing the quality of the maxmin strategy obtained by solving this abstraction.

Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs

  • Autoři: Ing. Karel Horák, Ph.D., doc. Mgr. Branislav Bošanský, Ph.D., Chatterjee, K.
  • Publikace: Proceedings of the International Joint Conferences on Artifical Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2018. p. 4764-4770. ISSN 1045-0823. ISBN 978-0-9992411-2-7.
  • Rok: 2018
  • DOI: 10.24963/ijcai.2018/662
  • Odkaz: https://doi.org/10.24963/ijcai.2018/662
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.

Computing Maxmin Strategies in Extensive-form Zero-sum Games with Imperfect Recall

  • DOI: 10.5220/0006121200630074
  • Odkaz: https://doi.org/10.5220/0006121200630074
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Extensive-form games with imperfect recall are an important game-theoretic model that allows a compact representation of strategies in dynamic strategic interactions. Practical use of imperfect recall games is limited due to negative theoretical results: a Nash equilibrium does not have to exist, computing maxmin strategies is NP-hard, and they may require irrational numbers. We present the first algorithm for approximating maxmin strategies in two-player zero-sum imperfect recall games without absentmindedness. We modify the wellknown sequence-form linear program to model strategies in imperfect recall games resulting in a bilinear program and use a recent technique to approximate the bilinear terms. Our main algorithm is a branch-andbound search that provably reaches the desired approximation after an exponential number of steps in the size of the game. Experimental evaluation shows that the proposed algorithm can approximate maxmin strategies of randomly generated imperfect recall games of sizes beyond toy-problems within few minutes.

Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games

  • DOI: 10.5220/0006190605030510
  • Odkaz: https://doi.org/10.5220/0006190605030510
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Pursuit-evasion scenarios appear widely in robotics, security domains, and many other real-world situations. We focus on two-player pursuit-evasion games with concurrent moves, infinite horizon, and discounted rewards. We assume that the players have partial observability, however, the evader has an advantage of knowing the current position of pursuer’s units. This setting is particularly interesting for security domains where a robust strategy, maximizing the utility in the worst-case scenario, is often desirable. We provide, to the best of our knowledge, the first algorithm that provably converges to the value of a partially observable pursuitevasion game with infinite horizon. Our algorithm extends well-known value iteration algorithm by exploiting that (1) value functions of our game depend only on the position of the pursuer and the belief he has about the position of the evader, and (2) that these functions are piecewise linear and convex in the belief space.

Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games

  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Security problems can be modeled as two-player partially observable stochastic games with one-sided partial observability and infinite horizon (one-sided POSGs). We seek for optimal strategies of player 1 that correspond to robust strategies against the worst-case opponent (player 2) that is assumed to have a perfect information about the game. We present a novel algorithm for approximately solving one-sided POSGs based on the heuristic search value iteration (HSVI) for POMDPs. Our results include (1) theoretical properties of one-sided POSGs and their value functions, (2) guarantees showing the convergence of our algorithm to optimal strategies, and (3) practical demonstration of applicability and scalability of our algorithm on three different domains: pursuit-evasion, patrolling, and search games.

Manipulating Adversary’s Belief: A Dynamic Game Approach to Deception by Design for Proactive Network Security

  • Autoři: Ing. Karel Horák, Ph.D., Zhu, Q., doc. Mgr. Branislav Bošanský, Ph.D.,
  • Publikace: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Düsseldorf: Springer VDI Verlag, 2017. p. 273-294. ISSN 0302-9743. ISBN 978-3-319-68710-0.
  • Rok: 2017
  • DOI: 10.1007/978-3-319-68711-7_15
  • Odkaz: https://doi.org/10.1007/978-3-319-68711-7_15
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Due to the sophisticated nature of current computer systems, traditional defense measures, such as firewalls, malware scanners, and intrusion detection/prevention systems, have been found inadequate. These technological systems suffer from the fact that a sophisticated attacker can study them, identify their weaknesses and thus get an advantage over the defender. To prevent this from happening a proactive cyber defense is a new defense mechanism in which we strategically engage the attacker by using cyber deception techniques, and we influence his actions by creating and reinforcing his view of the computer system. We apply the cyber deception techniques in the field of network security and study the impact of the deception on attacker’s beliefs using the quantitative framework of the game theory. We account for the sequential nature of an attack and investigate how attacker’s belief evolves and influences his actions. We show how the defender should manipulate this belief to prevent the attacker from achieving his goals and thus minimize the damage inflicted to the network. To design a successful defense based on cyber deception, it is crucial to employ strategic thinking and account explicitly for attacker’s belief that he is being exposed to deceptive attempts. By doing so, we can make the deception more believable from the perspective of the attacker.

Optimal Strategies for Detecting Data Exfiltration by Internal and External Attackers

  • DOI: 10.1007/978-3-319-68711-7_10
  • Odkaz: https://doi.org/10.1007/978-3-319-68711-7_10
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    We study the problem of detecting data exfiltration in computer networks. We focus on the performance of optimal defense strategies with respect to an attacker’s knowledge about typical network behavior and his ability to influence the standard traffic. Internal attackers know the typical upload behavior of the compromised host and may be able to discontinue standard uploads in favor of the exfiltration. External attackers do not immediately know the behavior of the compromised host, but they can learn it from observations.We model the problem as a sequential game of imperfect information, where the network administrator selects the thresholds for the detector, while the attacker chooses how much data to exfiltrate in each time step. We present novel algorithms for approximating the optimal defense strategies in the form of Stackelberg equilibria. We analyze the scalability of the algorithms and efficiency of the produced strategies in a case study based on real-world uploads of almost six thousand users to Google Drive. We show that with the computed defense strategies, the attacker exfiltrates 2–3 times less data than with simple heuristics; randomized defense strategies are up to 30% more effective than deterministic ones, and substantially more effective defense strategies are possible if the defense is customized for groups of hosts with similar behavior.

A Point-Based Approximate Algorithm for One-Side Partially Observable Pursuit-Evasion Games

  • DOI: 10.1007/978-3-319-47413-7_25
  • Odkaz: https://doi.org/10.1007/978-3-319-47413-7_25
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Pursuit-evasion games model many security problems where an evader is trying to escape a group of pursuing units. We consider a variant with partial observability and simultaneous moves of all units, and assume the worst-case setup, where the evader knows the location of pursuer’s units, but the pursuer does not know the location of the evader. Recent work has shown that the solution of such games is compactly representable as a collection of finite-dimensional value functions. We extend this result and propose the first practical algorithm for approximating optimal policies in pursuit-evasion games with one-sided partial observability. Our approach extends the point-based updates that exist for POMDPs to one-sided partially observable stochastic games. The experimental evaluation on multiple graphs shows significant improvementsover approximate algorithms that operate on finite game trees.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk