Persons

doc. Mgr. Viliam Lisý, MSc., Ph.D.

All publications

NASimEmu: Network Attack Simulator & Emulator for Training Agents Generalizing to Novel Scenarios

  • Department: Artificial Intelligence Center
  • Annotation:
    Current frameworks for training offensive penetration testing agents with deep reinforcement learning struggle to produce agents that perform well in real-world scenarios, due to the reality gap in simulation-based frameworks and the lack of scalability in emulation-based frameworks. Additionally, existing frameworks often use an unrealistic metric that measures the agents' performance on the training data. NASimEmu, a new framework introduced in this paper, addresses these issues by providing both a simulator and an emulator with a shared interface. This approach allows agents to be trained in simulation and deployed in the emulator, thus verifying the realism of the used abstraction. Our framework promotes the development of general agents that can transfer to novel scenarios unseen during their training. For the simulation part, we adopt an existing simulator NASim and enhance its realism. The emulator is implemented with industry-level tools, such as Vagrant, VirtualBox, and Metasploit. Experiments demonstrate that a simulation-trained agent can be deployed in emulation, and we show how to use the framework to train a general agent that transfers into novel, structurally different scenarios. NASimEmu is available as open-source.

JsonGrinder.jl: Automated Differentiable Neural Architecture for Embedding Arbitrary JSON Data

  • Department: Artificial Intelligence Center
  • Annotation:
    Standard machine learning (ML) problems are formulated on data converted into a suitable tensor representation. However, there are data sources, for example in cybersecurity, that are naturally represented in a unifying hierarchical structure, such as XML, JSON, and Protocol Buffers. Converting this data to a tensor representation is usually done by manual feature engineering, which is laborious, lossy, and prone to bias originating from the human inability to correctly judge the importance of particular features. JsonGrinder.jl is a library automating various ML tasks on these difficult sources. Starting with an arbitrary set of JSON samples, it automatically creates a differentiable ML model (called hmilnet), which embeds raw JSON samples into a fixed-size tensor representation. This embedding network can be naturally extended by an arbitrary ML model expecting tensor inputs in order to perform classification, regression, or clustering.

Rethinking formal models of partially observable multiagent decision making

  • Authors: Kovařík, V., Schmid, M., Burch, N., Bowling, M., doc. Mgr. Viliam Lisý, MSc., Ph.D.,
  • Publication: Artificial Intelligence. 2022, 303(103645), 1-21. ISSN 1872-7921.
  • Year: 2022
  • DOI: 10.1016/j.artint.2021.103645
  • Link: https://doi.org/10.1016/j.artint.2021.103645
  • Department: Artificial Intelligence Center
  • Annotation:
    Multiagent decision-making in partially observable environments is usually modelled as either an extensive-form game (EFG) in game theory or a partially observable stochastic game (POSG) in multiagent reinforcement learning (MARL). One issue with the current situation is that while most practical problems can be modelled in both formalisms, the relationship of the two models is unclear, which hinders the transfer of ideas between the two communities. A second issue is that while EFGs have recently seen significant algorithmic progress, their classical formalization is unsuitable for efficient presentation of the underlying ideas, such as those around decomposition. To solve the first issue, we introduce factored-observation stochastic games (FOSGs), a minor modification of the POSG formalism which distinguishes between private and public observation and thereby greatly simplifies decomposition. To remedy the second issue, we show that FOSGs and POSGs are naturally connected to EFGs: by “unrolling” a FOSG into its tree form, we obtain an EFG. Conversely, any perfect-recall timeable EFG corresponds to some underlying FOSG in this manner. Moreover, this relationship justifies several minor modifications to the classical EFG formalization that recently appeared as an implicit response to the model’s issues with decomposition. Finally, we illustrate the transfer of ideas between EFGs and MARL by presenting three key EFG techniques – counterfactual regret minimization, sequence form, and decomposition – in the FOSG framework.

Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

  • Authors: Ing. David Milec, Černý, J., doc. Mgr. Viliam Lisý, MSc., Ph.D., An, B.
  • Publication: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence. Palo Alto, California: Association for the Advancement of Artificial Intelligence (AAAI), 2021. p. 5575-5583. 35. ISSN 2374-3468. ISBN 978-1-57735-866-4.
  • Year: 2021
  • Department: Department of Computer Science, Artificial Intelligence Center
  • Annotation:
    Solution concepts of traditional game theory assume entirely rational players; therefore, their ability to exploit subrational opponents is limited. One type of subrationality that describes human behavior well is the quantal response. While there exist algorithms for computing solutions against quantal opponents, they either do not scale or may provide strategies that are even worse than the entirely-rational Nash strategies. This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games. Our contributions are: (1) we define two different solution concepts related to exploiting quantal opponents and analyze their properties; (2) we prove that computing these solutions is computationally hard;(3) therefore, we evaluate several heuristic approximations based on scalable counterfactual regret minimization (CFR); and (4) we identify a CFR variant that exploits the bounded opponents better than the previously used variants while being less exploitable by the worst-case perfectly-rational opponent.

Computing Quantal Stackelberg Equilibrium in Extensive-Form Games

  • Department: Artificial Intelligence Center
  • Annotation:
    Deployments of game-theoretic solution concepts in the real world have highlighted the necessity to consider human opponents' boundedly rational behavior. If subrationality is not addressed, the system can face significant losses in terms of expected utility. While there exist algorithms for computing optimal strategies to commit to when facing subrational decision-makers in one-shot interactions, these algorithms cannot be generalized for solving sequential scenarios because of the inherent curse of strategy-space dimensionality in sequential games and because humans act subrationally in each decision point separately. We study optimal strategies to commit to against subrational opponents in sequential games for the first time and make the following key contributions: (1) we prove the problem is NP-hard in general; (2) to enable further analysis, we introduce a non-fractional reformulation of the direct non-concave representation of the equilibrium; (3) we identify conditions under which the problem can be approximated in polynomial time in the size of the representation; (4) we show how an MILP can approximate the reformulation with a guaranteed bounded error, and (5) we experimentally demonstrate that our algorithm provides higher quality results several orders of magnitude faster than a baseline method for general non-linear optimization.

Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games

  • DOI: 10.1007/s10994-019-05832-z
  • Link: https://doi.org/10.1007/s10994-019-05832-z
  • Department: Artificial Intelligence Center
  • Annotation:
    Hannan consistency, or no external regret, is a key concept for learning in games. An action selection algorithm is Hannan consistent (HC) if its performance is eventually as good as selecting the best fixed action in hindsight. If both players in a zero-sum normal form game use a Hannan consistent algorithm, their average behavior converges to a Nash equilibrium of the game. A similar result is known about extensive form games, but the played strategies need to be Hannan consistent with respect to the counterfactual values, which are often difficult to obtain. We study zero-sum extensive form games with simultaneous moves, but otherwise perfect information. These games generalize normal form games and they are a special case of extensive form games. We study whether applying HC algorithms in each decision point of these games directly to the observed payoffs leads to convergence to a Nash equilibrium. This learning process corresponds to a class of Monte Carlo Tree Search algorithms, which are popular for playing simultaneous-move games but do not have any known performance guarantees. We show that using HC algorithms directly on the observed payoffs is not sufficient to guarantee the convergence. With an additional averaging over joint actions, the convergence is guaranteed, but empirically slower. We further define an additional property of HC algorithms, which is sufficient to guarantee the convergence without the averaging and we empirically show that commonly used HC algorithms have this property.

Automated Construction of Bounded-loss Imperfect-recall Abstractions in Extensive-form Games

  • DOI: 10.1016/j.artint.2020.103248
  • Link: https://doi.org/10.1016/j.artint.2020.103248
  • Department: Artificial Intelligence Center
  • Annotation:
    Extensive-form games (EFGs) model finite sequential interactions between players. The amount of memory required to represent these games is the main bottleneck of algorithms for computing optimal strategies and the size of these strategies is often impractical for real-world applications. A common approach to tackle the memory bottleneck is to use information abstraction that removes parts of information available to players thus reducing the number of decision points in the game. However, existing information-abstraction techniques are either specific for a particular domain, they do not provide any quality guarantees, or they are applicable to very small subclasses of EFGs. We present domain-independent abstraction methods for creating imperfect recall abstractions in extensive-form games that allow computing strategies that are (near) optimal in the original game. To this end, we introduce two novel algorithms, FPIRA and CFR+IRA, based on fictitious play and counterfactual regret minimization. These algorithms can start with an arbitrary domain specific, or the coarsest possible, abstraction of the original game. The algorithms iteratively detect the missing information they require for computing a strategy for the abstract game that is (near) optimal in the original game. This information is then included back into the abstract game. Moreover, our algorithms are able to exploit imperfect-recall abstractions that allow players to forget even history of their own actions. However, the algorithms require traversing the complete unabstracted game tree. We experimentally show that our algorithms can closely approximate Nash equilibrium of large games using abstraction with as little as 0.9% of information sets of the original game. Moreover, the results suggest that memory savings increase with the increasing size of the original games.

Automated construction of bounded-loss imperfect-recall abstractions in extensive-form games

  • Department: Artificial Intelligence Center
  • Annotation:
    Information abstraction is one of the methods for tackling large extensive-form games (EFGs). Removing some information available to players reduces the memory required for computing and storing strategies. We present novel domain-independent abstraction methods for creating very coarse abstractions of EFGs that still compute strategies that are (near) optimal in the original game. First, the methods start with an arbitrary abstraction of the original game (domain-specific or the coarsest possible). Next, they iteratively detect which information is required in the abstract game so that a (near) optimal strategy in the original game can be found and include this information into the abstract game. Moreover, the methods are able to exploit imperfect-recall abstractions where players can even forget the history of their own actions. We present two algorithms that follow these steps - FPIRA, based on fictitious play, and CFR+IRA, based on counterfactual regret minimization. The experimental evaluation confirms that our methods can closely approximate Nash equilibrium of large games using abstraction with only 0.9% of information sets of the original game.

Classification with Costly Features as a Sequential Decision-making Problem

  • DOI: 10.1007/s10994-020-05874-8
  • Link: https://doi.org/10.1007/s10994-020-05874-8
  • Department: Department of Computer Science, Artificial Intelligence Center
  • Annotation:
    This work focuses on a specific classification problem, where the information about a sample is not readily available, but has to be acquired for a cost, and there is a per-sample budget. Inspired by real-world use-cases, we analyze average and hard variations of a directly specified budget. We postulate the problem in its explicit formulation and then convert it into an equivalent MDP, that can be solved with deep reinforcement learning. Also, we evaluate a real-world inspired setting with sparse training datasets with missing features. The presented method performs robustly well in all settings across several distinct datasets, outperforming other prior-art algorithms. The method is flexible, as showcased with all mentioned modifications and can be improved with any domain independent advancement in RL.

Dinkelbach-type algorithm for computing quantal stackelberg equilibrium

  • Department: Artificial Intelligence Center
  • Annotation:
    Stackelberg security games (SSGs) have been deployed in many real-world situations to optimally allocate scarce resource to protect targets against attackers. However, actual human attackers are not perfectly rational and there are several behavior models that attempt to predict subrational behavior. Quantal response is among the most commonly used such models and Quantal Stackelberg Equilibrium (QSE) describes the optimal strategy to commit to when facing a subrational opponent. Non-concavity makes computing QSE computationally challenging and while there exist algorithms for computing QSE for SSGs, they cannot be directly used for solving an arbitrary game in the normal form. We (1) present a transformation of the primal problem for computing QSE using a Dinkelbach's method for any general-sum normal-form game, (2) provide a gradient-based and a MILP-based algorithm, give the convergence criteria, and bound their error, and finally (3) we experimentally demonstrate that using our novel transformation, a QSE can be closely approximated several orders of magnitude faster.

Discovering Imperfectly Observable Adversarial Actions Using Anomaly Detection

  • Department: Artificial Intelligence Center
  • Annotation:
    Defenders in security problems often use anomaly detection (AD) to examine effects of (adversarial) actions and detect malicious behavior. Attackers seek to accomplish their goal (e.g., exfiltrate data) while avoiding the detection. Game theory can be used to reason about this interaction. While AD has been used in game-theoretic frameworks before, we extend the existing works to more realistic settings by (1) allowing players to have continuous action spaces and (2) assuming that the defender cannot perfectly observe the action of the attacker. We solve our model by (1) extending existing algorithms that discretize the action spaces and use linear programming and (2) by training a neural network using an algorithm based on exploitability descent, termed EDA. While both algorithms are applicable for low feature-space dimensions, EDA produces less exploitable strategies and scales to higher dimensions. In a data exfiltration scenario, EDA outperforms a range of classifiers when facing a targeted exploitative attacker.

Classification with Costly Features Using Deep Reinforcement Learning

  • DOI: 10.1609/aaai.v33i01.33013959
  • Link: https://doi.org/10.1609/aaai.v33i01.33013959
  • Department: Department of Computer Science, Artificial Intelligence Center
  • Annotation:
    We study a classification problem where each feature can be acquired for a cost and the goal is to optimize a trade-off between the expected classification error and the feature cost.We revisit a former approach that has framed the problem as a sequential decision-making problem and solved it by Q-learning with a linear approximation, where individual actions are either requests for feature values or terminate the episode by providing a classification decision. On a set of eight problems, we demonstrate that by replacing the linear approximation with neural networks the approach becomes comparable to the state-of-the-art algorithms developed specifically for this problem. The approach is flexible, as it can be improved with any new reinforcement learning enhancement, it allows inclusion of pre-trained high-performance classifier, and unlike prior art, its performance is robust across all evaluated datasets.

Hardening networks against strategic attackers using attack graph games

  • DOI: 10.1016/j.cose.2019.101578
  • Link: https://doi.org/10.1016/j.cose.2019.101578
  • Department: Artificial Intelligence Center
  • Annotation:
    We consider the problem faced by a network administrator (defender) when deploying limited security resources to protect a network against a strategic attacker. To evaluate the effectiveness of a defense strategy, one must consider possible counterattacks that an attacker can choose. We use game theory to model the interaction between the defender and the attacker. Game theory provides relevant concepts and algorithms for computing optimal strategies in environments with multiple decision makers. To model the space of attacker's possible actions, we use attack graphs, that compactly represent all known sequences of attacker's action that may lead to successful attack for a given network. We demonstrate our approach on a specific type of defense actions, where the defender deploys deceptive hosts and services (honeypots) to detect and mitigate attacks. We assume the worst-case attacker who has a complete knowledge of the (typically randomized) defense strategy. We seek the optimal defense strategy against this attacker in the form of a Stackelberg equilibrium. Computing this solution exactly using standard techniques has limited scalability, so we investigate several approaches for increasing scalability to realistic problems. We introduce optimization methods for finding exact solutions for these games and then propose a variety of polynomial heuristic algorithms that scale to significantly larger games. We analyze the scalability and the quality of these heuristic solutions on realistic network topologies. We show that the strategies found by the heuristics are often near-optimal and that they outperform non-game-theoretic baselines. Finally, we show how attack graph games can be used to answer various research questions relevant to network security administrators.

Monte Carlo Continual Resolving for Online Strategy Computation in Imperfect Information Games

  • Authors: Ing. Michal Šustr, Kovařík, V., doc. Mgr. Viliam Lisý, MSc., Ph.D.,
  • Publication: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2019. p. 224-232. ISSN 2523-5699. ISBN 978-1-4503-6309-9.
  • Year: 2019
  • Department: Department of Computer Science, Artificial Intelligence Center
  • Annotation:
    Online game playing algorithms produce high-quality strategieswith a fraction of memory and computation required by their of-fline alternatives. Continual Resolving (CR) is a recent theoreti-cally sound approach to online game playing that has been usedto outperform human professionals in poker. However, parts ofthe algorithm were specific to poker, which enjoys many proper-ties not shared by other imperfect information games. We presenta domain-independent formulation of CR applicable to any two-player zero-sum extensive-form games (EFGs). It works with anabstract resolving algorithm, which can be instantiated by variousEFG solvers. We further describe and implement its Monte Carlovariant (MCCR) which uses Monte Carlo Counterfactual RegretMinimization (MCCFR) as a resolver. We prove the correctness ofCR and show anO(T−1/2)-dependence of MCCR’s exploitabilityon the computation time. Furthermore, we present an empiricalcomparison of MCCR with incremental tree building to OnlineOutcome Sampling and Information-set MCTS on several domains.

Approximating Maxmin Strategies in Imperfect Recall Games Using A-loss Recall Property

  • DOI: 10.1016/j.ijar.2017.11.010
  • Link: https://doi.org/10.1016/j.ijar.2017.11.010
  • Department: Department of Computer Science, Artificial Intelligence Center
  • Annotation:
    Extensive-form games with imperfect recall are an important model of dynamic games where the players are allowed to forget previously known information. Often, imperfect recall games result from an abstraction algorithm that simplifies a large game with perfect recall. Solving imperfect recall games is known to be a hard problem, and thus it is useful to search for a subclass of imperfect recall games which offers sufficient memory savings while being efficiently solvable. The abstraction process can then be guided to result in a game from this class. We focus on a subclass of imperfect recall games called A-loss recall games. First, we provide a complete picture of the complexity of solving imperfect recall and A-loss recall games. We show that the A-loss recall property allows us to compute a best response in polynomial time (computing a best response is NP-hard in imperfect recall games). This allows us to create a practical algorithm for approximating maxmin strategies in two-player games where the maximizing player has imperfect recall and the minimizing player has A-loss recall. This algorithm is capable of solving some games with up to 5⋅109 states in approximately 1 hour. Finally, we demonstrate that the use of imperfect recall abstraction can reduce the size of the strategy representation to as low as 0.03% of the size of the strategy representation in the original perfect recall game without sacrificing the quality of the maxmin strategy obtained by solving this abstraction.

Path Hopping: An MTD Strategy for Long-Term Quantum-Safe Communication

  • DOI: 10.1155/2018/8475818
  • Link: https://doi.org/10.1155/2018/8475818
  • Department: Artificial Intelligence Center
  • Annotation:
    Moving target defense (MTD) strategies have been widely studied for securing computer systems. We consider using MTD strategies to provide long-term cryptographic security for message transmission against an eavesdropping adversary who has access to a quantum computer. In such a setting, today's widely used cryptographic systems including Diffie-Hellman key agreement protocol and RSA cryptosystem will be insecure and alternative solutions are needed. We will use a physical assumption, existence of multiple communication paths between the sender and the receiver, as the basis of security, and propose a cryptographic system that uses this assumption and an MTD strategy to guarantee efficient long-term information theoretic security even when only a single path is not eavesdropped. Following the approach of Maleki et al., we model the system using a Markov chain, derive its transition probabilities, propose two security measures, and prove results that show how to calculate these measures using transition probabilities. We define two types of attackers that we call risk-taking and risk-averse and compute our proposed measures for the two types of adversaries for a concrete MTD strategy. We will use numerical analysis to study tradeoffs between system parameters, discuss our results, and propose directions for future research.

An Algorithm for Constructing and Solving Imperfect Recall Abstractions of Large Extensive-Form Games

  • DOI: 10.24963/ijcai.2017/130
  • Link: https://doi.org/10.24963/ijcai.2017/130
  • Department: Artificial Intelligence Center
  • Annotation:
    We solve large two-player zero-sum extensive-form games with perfect recall. We propose a new algorithm based on fictitious play that significantly reduces memory requirements for storing average strategies. The key feature is exploiting imperfect recall abstractions while preserving the convergence rate and guarantees of fictitious play applied directly to the perfect recall game. The algorithm creates a coarse imperfect recall abstraction of the perfect recall game and automatically refines its information set structure only where the imperfect recall might cause problems. Experimental evaluation shows that our novel algorithm is able to solve a simplified poker game with 7.10^5 information sets using an abstracted game with only 1.8% of information sets of the original game. Additional experiments on poker and randomly generated games suggest that the relative size of the abstraction decreases as the size of the solved games increases.

DeepStack: Expert-level artificial intelligence in heads-up no-limit poker

  • DOI: 10.1126/science.aam6960
  • Link: https://doi.org/10.1126/science.aam6960
  • Department: Artificial Intelligence Center
  • Annotation:
    Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker, the quintessential game of imperfect information, is a long-standing challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect-information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated, with statistical significance, professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce strategies that are more difficult to exploit than prior approaches. © 2017, American Association for the Advancement of Science.

Economically Optimal Variable Tag Length Message Authentication

  • Authors: Safavi-Naini, R., doc. Mgr. Viliam Lisý, MSc., Ph.D., Desmedt, Y.G.
  • Publication: 21st International Conference on Financial Cryptography and Data Security. Springer, Cham, 2017. p. 204-223. ISSN 0302-9743. ISBN 978-3-319-70971-0.
  • Year: 2017
  • DOI: 10.1007/978-3-319-70972-7_11
  • Link: https://doi.org/10.1007/978-3-319-70972-7_11
  • Department: Artificial Intelligence Center
  • Annotation:
    Cryptographic authentication protects messages against forgeries. In real life, messages carry information of different value and the gain of the adversary in a successful forgery and the corresponding cost of the system designers, depend on the "meaning" of the message. This is easy o see by comparing the successful forgery of a $1,000 transaction with the forgery of a $1 one. Cryptographic protocols require computation and increase communication cost of the system, and an economically optimal system must optimize these costs such that message protection be commensurate to their values. This is especially important for resource limited devices that rely on battery power. A MAC (Message Authentication Code) provides protection by appending a cryptographic tag to the message. For secure MACs, the tag length is the main determinant of the security level: longer tags provide higher protection and at the same time increase the communication cost of the system. Our goal is to find the economically optimal tag lengths when messages carry information of different values.

Eqilibrium Approximation Quality of Current No-Limit Poker Bots

  • Department: Artificial Intelligence Center
  • Annotation:
    Approximating a Nash equilibrium is currently the best performing approach for creating poker-playing programs. While for the simplest variants of the game, it is possible to evaluate the quality of the approximation by computing the value of the best response strategy, this is currently not computationally feasible for larger variants of the game, such as heads-up no-limit Texas hold'em. In this paper, we present a simple and computationally inexpensive Local Best Response method for computing an approximate lower bound on the value of the best response strategy. Using this method, we show that existing poker-playing programs, based on solving abstract games, are remarkably poor Nash equilibrium approximations.

Optimal Strategies for Detecting Data Exfiltration by Internal and External Attackers

  • DOI: 10.1007/978-3-319-68711-7_10
  • Link: https://doi.org/10.1007/978-3-319-68711-7_10
  • Department: Department of Computer Science, Artificial Intelligence Center
  • Annotation:
    We study the problem of detecting data exfiltration in computer networks. We focus on the performance of optimal defense strategies with respect to an attacker’s knowledge about typical network behavior and his ability to influence the standard traffic. Internal attackers know the typical upload behavior of the compromised host and may be able to discontinue standard uploads in favor of the exfiltration. External attackers do not immediately know the behavior of the compromised host, but they can learn it from observations.We model the problem as a sequential game of imperfect information, where the network administrator selects the thresholds for the detector, while the attacker chooses how much data to exfiltrate in each time step. We present novel algorithms for approximating the optimal defense strategies in the form of Stackelberg equilibria. We analyze the scalability of the algorithms and efficiency of the produced strategies in a case study based on real-world uploads of almost six thousand users to Google Drive. We show that with the computed defense strategies, the attacker exfiltrates 2–3 times less data than with simple heuristics; randomized defense strategies are up to 30% more effective than deterministic ones, and substantially more effective defense strategies are possible if the defense is customized for groups of hosts with similar behavior.

Path Hopping: an MTD Strategy for Quantum-safe Communication

  • Authors: Safavi-Naini, R., Poostindouz, A., doc. Mgr. Viliam Lisý, MSc., Ph.D.,
  • Publication: Proceedings of the 2017 Workshop on Moving Target Defense. New York: ACM, 2017. p. 111-114. ISBN 978-1-4503-5176-8.
  • Year: 2017
  • DOI: 10.1145/3140549.3140560
  • Link: https://doi.org/10.1145/3140549.3140560
  • Department: Artificial Intelligence Center
  • Annotation:
    Moving target defense (MTD) strategies have been widely studied for securing computer communication systems. We consider using MTD strategies as a cryptographic mechanism for providing se- cure communication when the adversary has access to a quantum computer and security is required over a long period of time. We assume Alice and Bob are connected by multiple disjoint paths, not all of which can be eavesdropped by the attacker at the same time. We propose a cryptographic system that uses an MTD strategy that achieves long-term quantum-safe security. We model the system as a Markov chain, and propose two security measures that correspond to two types of adversaries, called risk-taking and risk-averse. Our numerical simulations shows dependencies between system parameters, and leads to new insights, such as quantifying the cost of being a risk-averse adversary.

Algorithms for computing strategies in two-player simultaneous move games

  • DOI: 10.1016/j.artint.2016.03.005
  • Link: https://doi.org/10.1016/j.artint.2016.03.005
  • Department: Artificial Intelligence Center
  • Annotation:
    Simultaneous move games model discrete, multistage interactions where at each stage players simultaneously choose their actions. At each stage, a player does not know what action the other player will take, but otherwise knows the full state of the game. This formalism has been used to express games in general game playing and can also model many discrete approximations of real-world scenarios. In this paper, we describe both novel and existing algorithms that compute strategies for the class of two-player zero-sum simultaneous move games. The algorithms include exact backward induction methods with efficient pruning, as well as Monte Carlo sampling algorithms. We evaluate the algorithms in two different settings: the offline case, where computational resources are abundant and closely approximating the optimal strategy is a priority, and the online search case, where computational resources are limited and acting quickly is necessary. We perform a thorough experimental evaluation on six substantially different games for both settings. For the exact algorithms, the results show that our pruning techniques for backward induction dramatically improve the computation time required by the previous exact algorithms. For the sampling algorithms, the results provide unique insights into their performance and identify favorable settings and domains for different sampling algorithms. (C) 2016 Elsevier B.V. All rights reserved.

Case Studies of Network Defense with Attack Graph Games

  • DOI: 10.1109/MIS.2016.74
  • Link: https://doi.org/10.1109/MIS.2016.74
  • Department: Artificial Intelligence Center
  • Annotation:
    The increaing complexity of securing modern computer networks makes decision support systems an important tool for administrators. A challenge many existing tools fail to address is that attackers react strategically to new security measures, adapting their behaviors in response. Game theory provides a methodology for making decisions that takes into account these reactions, rather than assuming static attackers. The authors present an overview of how game theory can be used to inform one type of security decision: how to optimally place honeypots in a network. They demonstrate this approach on a realistic case study and present initial validation results based on a study comparing their approach with human decision makers.

Using Correlated Strategies for Computing Stackelberg Equilibria in Extensive-form Games

  • Department: Artificial Intelligence Center
  • Annotation:
    Strong Stackelberg Equilibrium (SSE) is a fundamental solution concept in game theory in which one player commits to a strategy, while the other player observes this commitment and plays a best response. We present a new algorithm for computing SSE for two-player extensive-form general-sum games with imperfect information (EFGs) where computing SSE is an NP-hard problem. Our algorithm is based on a correlated version of SSE, known as Stackelberg Extensive-Form Correlated Equilibrium (SEFCE). Our contribution is therefore twofold: (1) we give the first linear program for computing SEFCE in EFGs without chance, (2) we repeatedly solve and modify this linear program in a systematic search until we arrive to SSE. Our new algorithm outperforms the best previous algorithms by several orders of magnitude.

Approximate Solutions for Attack Graph Games with Imperfect Information

  • DOI: 10.1007/978-3-319-25594-1_13
  • Link: https://doi.org/10.1007/978-3-319-25594-1_13
  • Department: Department of Computer Science
  • Annotation:
    We study the problem of network security hardening, in which a network administrator decides what security measures to use to best improve the security of the network. Specifically, we focus on deploying decoy services or hosts called honeypots. We model the problem as a general-sum extensive-form game with imperfect information and seek a solution in the form of Stackelberg Equilibrium. The defender seeks the optimal randomized honeypot deployment in a specific computer network, while the attacker chooses the best response as a contingency attack policy from a library of possible attacks compactly represented by attack graphs. Computing an exact Stackelberg Equilibrium using standard mixed-integer linear programming has a limited scalability in this game. We propose a set of approximate solution methods and analyze the trade-off between the computation time and the quality of the strategies calculated.

Combining online learning and equilibrium computation in security games

  • Authors: Klíma, R., doc. Mgr. Viliam Lisý, MSc., Ph.D., Kiekintveld, Ch.
  • Publication: Decision and Game Theory for Security. London: Springer, 2015. pp. 130-149. ISSN 0302-9743. ISBN 978-3-319-25593-4.
  • Year: 2015
  • DOI: 10.1007/978-3-319-25594-1_8
  • Link: https://doi.org/10.1007/978-3-319-25594-1_8
  • Department: Department of Computer Science
  • Annotation:
    Game-theoretic analysis has emerged as an important method for making resource allocation decisions in both infrastructure protection and cyber security domains. However, static equilibrium models defined based on inputs from domain experts have weaknesses; they can be inaccurate, and they do not adapt over time as the situation (and adversary) evolves. In cases where there are frequent interactions with an attacker, using learning to adapt to an adversary revealed behavior may lead to better solutions in the long run. However, learning approaches need a lot of data, may perform poorly at the start, and may not be able to take advantage of expert analysis. We explore ways to combine equilibrium analysis with online learning methods with the goal of gaining the advantages of both approaches. We present several hybrid methods that combine these techniques in different ways, and empirically evaluated the performance of these methods in a game that models a border patrolling scenario.

Game-Theoretic Algorithms for Optimal Network Security Hardening Using Attack Graphs

  • Department: Department of Computer Science
  • Annotation:
    In network security hardening a network administrator may need to use limited resources (such as honeypots) to harden a network against possible attacks. Attack graphs are a common formal model used to represent possible attacks. However, most existing works on attack graphs do not consider the reactions of attackers to different defender strategies. We introduce a game-theoretic model of the joint problem where attacker’s strategies are represented using attack graphs, and defender’s strategies are represented as modifications of the attack graph. The attack graphs we use allow for sequential attack actions with associated costs and probabilities of success/failure. We present an algorithm for an computing attack policy that maximizes attacker’s expected reward and empirical results demonstrating our methods on a case study network.

Game-theoretic Foundations for the Strategic Use of Honeypots in Network Security

  • DOI: 10.1007/978-3-319-14039-1_5
  • Link: https://doi.org/10.1007/978-3-319-14039-1_5
  • Department: Department of Computer Science
  • Annotation:
    An important element in the mathematical and scientific foundations for security is modeling the strategic use of deception and information manipulation. We argue that game theory provides an important theoretical framework for reasoning about information manipulation in adversarial settings, including deception and randomization strategies. In addition, game theory has practical uses in determining optimal strategies for randomized patrolling and resource allocation. We discuss three game-theoretic models that capture aspects of how honeypots can be used in network security. Honeypots are fake hosts introduced into a network to gather information about attackers and to distract them from real targets. They are a limited resource, so there are important strategic questions about how to deploy them to the greatest effect, which is fundamentally about deceiving attackers into choosing fake targets instead of real ones to attack. We describe several game models that address strategies for deploying honeypots, including a basic honeypot selection game, an extension of this game that allows additional probing actions by the attacker, and finally a version in which attacker strategies are represented using attack graphs. We conclude with a discussion of the strengths and limitations of game theory in the context of network security.

Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games

  • Authors: doc. Mgr. Viliam Lisý, MSc., Ph.D., Lanctot, M., Bowling, M.
  • Publication: Autonomous Agents and Multiagent Systems. County of Richland: IFAAMAS, 2015. p. 27-36. 14. ISSN 1548-8403. ISBN 978-1-4503-3771-7.
  • Year: 2015
  • Department: Department of Computer Science
  • Annotation:
    Online search in games has been a core interest of artificial intelligence. Search in imperfect information games (e.g., Poker, Bridge, Skat) is particularly challenging due to the complexities introduced by hidden information. In this paper, we present Online Outcome Sampling, an online search variant of Monte Carlo Counterfactual Regret Minimization, which preserves its convergence to Nash equilibrium. We show that OOS can overcome the problem of non-locality encountered by previous search algorithms and perform well against its worst-case opponents. We show that exploitability of the strategies played by OOS decreases as the amount of search time increases, and that preexisting Information Set Monte Carlo tree search (ISMCTS) can get more exploitable over time. In head-to-head play, OOS outperforms ISMCTS in games where non-locality plays a significant role, given a sufficient computation time per move.

Optimal Network Security Hardening Using Attack Graph Games

  • Department: Department of Computer Science
  • Annotation:
    Preventing attacks in a computer network is the core problem in network security. We introduce a new game-theoretic model of the interaction between a network administrator who uses limited resource to harden a network and an attacker who follows a multi-stage plan to attack the network. The possible plans of the attacker are compactly represented using attack graphs, while the defender adds fake targets (honeypots) to the network to deceive the attacker. The compact representation of the attacker’s strategies presents a computational challenge and finding the best response of the attacker is NP-hard. We present a solution method that first translates an attack graph into an MDP and solves it using policy search with a set of pruning techniques. We present an empirical evaluation of the model and solution algorithms, evaluating scalability, the types of solutions that are generated for realistic cases, and sensitivity analysis.

ALTERNATIVE SELECTION FUNCTIONS FOR INFORMATION SET MONTE CARLO TREE SEARCH

  • DOI: 10.14311/AP.2014.54.0333
  • Link: https://doi.org/10.14311/AP.2014.54.0333
  • Department: Department of Computer Science
  • Annotation:
    We evaluate the performance of various selection methods for the Monte Carlo Tree Search algorithm in two-player zero-sum extensive-form games with imperfect information. We compare the standard Upper Confident Bounds applied to Trees (UCT) along with the less common Exponential Weights for Exploration and Exploitation (Exp3) and novel Regret matching (RM) selection in two distinct imperfect information games: Imperfect Information Goofspiel and Phantom Tic-Tac-Toe. We show that UCT after initial fast convergence towards a Nash equilibrium computes increasingly worse strategies after some point in time. This is not the case with Exp3 and RM, which also show superior performance in head-to-head matches.

An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information

  • Department: Department of Computer Science
  • Annotation:
    Developing scalable solution algorithms is one of the central problems in computational game theory. We present an iterative algorithm for computing an exact Nash equilibrium for two-player zero-sum extensive-form games with imperfect information. Our approach combines two key elements: (1) the compact sequence-form representation of extensive-form games and (2) the algorithmic framework of double-oracle methods. The main idea of our algorithm is to restrict the game by allowing the players to play only selected sequences of available actions. After solving the restricted game, new sequences are added by finding best responses to the current solution using fast algorithms. We experimentally evaluate our algorithm on a set of games inspired by patrolling scenarios, board, and card games. The results show significant runtime improvements in games admitting an equilibrium with small support, and substantial improvement in memory use even on games with large support. The improvement in memory use is particularly important because it allows our algorithm to solve much larger game instances than existing linear programming methods. Our main contributions include (1) a generic sequence-form double-oracle algorithm for solving zero-sum extensive-form games; (2) fast methods for maintaining a valid restricted game model when adding new sequences; (3) a search algorithm and pruning methods for computing best-response sequences; (4) theoretical guarantees about the convergence of the algorithm to a Nash equilibrium; (5) experimental analysis of our algorithm on several games, including an approximate version of the algorithm.

Computing Optimal Policies for Attack Graphs with Action Failures and Costs

  • Authors: Durkota, K., doc. Mgr. Viliam Lisý, MSc., Ph.D.,
  • Publication: Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press, 2014. p. 101-110. ISSN 0922-6389. ISBN 978-1-61499-420-6.
  • Year: 2014
  • DOI: 10.3233/978-1-61499-421-3-101
  • Link: https://doi.org/10.3233/978-1-61499-421-3-101
  • Department: Department of Computer Science
  • Annotation:
    An attack graph represents all known sequences of actions that compromise a system in form of an and-or graph. We assume that each action in the attack graph has a specified cost and probability of success and propose an algorithm for computing an action selection policy minimizing the expected cost of performing an attack. We model the problem as a finite horizon MDP and use forward search with transposition tables and various pruning techniques based on the structure of the attack graph. We experimentally compare the proposed algorithm to a generic MDP solver and a solver transforming the problem to an Unconstrained Influence Diagram showing a substantial runtime improvement.

Monte Carlo Tree Search in Simultaneous Move Games with Applications to Goofspiel

  • Authors: Lanctot, M., doc. Mgr. Viliam Lisý, MSc., Ph.D., Winands, M.H.M.
  • Publication: Computer Games. Cham: Springer International Publishing AG, 2014. pp. 28-43. Communications in Computer and Information Science. ISSN 1865-0929. ISBN 978-3-319-05427-8.
  • Year: 2014
  • DOI: 10.1007/978-3-319-05428-5_3
  • Link: https://doi.org/10.1007/978-3-319-05428-5_3
  • Department: Department of Computer Science
  • Annotation:
    Monte Carlo Tree Search (MCTS) has become a widely popular sampled-based search algorithm for two-player games with perfect information. When actions are chosen simultaneously, players may need to mix between their strategies. In this paper, we discuss the adaptation of MCTS to simultaneous move games. We introduce a new algorithm, Online Outcome Sampling (OOS), that approaches a Nash equilibrium strategy over time. We compare both head-to-head performance and exploitability of several MCTS variants in Goofspiel. We show that regret matching and OOS perform best and that all variants produce less exploitable strategies than UCT.

Online Learning Methods for Border Patrol Resource Allocation

  • Authors: Klíma, R., Kiekintveld, Ch., doc. Mgr. Viliam Lisý, MSc., Ph.D.,
  • Publication: Decision and Game Theory for Security. Heidelberg: Springer, 2014. p. 340-349. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-12600-5.
  • Year: 2014
  • DOI: 10.1007/978-3-319-12601-2_20
  • Link: https://doi.org/10.1007/978-3-319-12601-2_20
  • Department: Department of Computer Science
  • Annotation:
    We introduce a model for border security resource allocation with repeated interactions between attackers and defenders. The defender must learn the optimal resource allocation strategy based on historical apprehension data, balancing exploration and exploitation in the policy. We experiment with several solution methods for this online learning problem including UCB, sliding-window UCB, and EXP3. We test the learning methods against several different classes of attackers including attacker with randomly varying strategies and attackers who react adversarially to the defender's strategy. We present experimental data to identify the optimal parameter settings for these algorithms and compare the algorithms against the different types of attackers.

Practical Performance of Refinements of Nash Equilibria in Extensive-Form Zero-Sum Games

  • DOI: 10.3233/978-1-61499-419-0-201
  • Link: https://doi.org/10.3233/978-1-61499-419-0-201
  • Department: Department of Cybernetics, Department of Computer Science
  • Annotation:
    Nash equilibrium (NE) is the best known solution concept used in game theory. It is known that NE is particularly weak even in zero-sum extensive-form games since it can prescribe irrational actions to play that do not exploit mistakes made by an imperfect opponent. These issues are addressed by a number of refinements of NE that strengthen the requirements for equilibrium strategies. However, a thorough experimental analysis of practical performance of the Nash equilibria refinement strategies is, to the best of our knowledge, missing. This paper aims to fill this void and provides the first broader experimental comparison of the quality of refined Nash strategies in zero-sum extensive-form games. The experimental results suggest that (1) there is a significant difference between the best and the worst NE strategy against imperfect opponents, (2) the existing refinements outperform the worst NE strategy, (3) they typically perform close to the best possible NE strategy, and (4) the difference in performance of all compared refinements is very small.

Randomized Operating Point Selection in Adversarial Classification

  • DOI: 10.1007/978-3-662-44851-9_16
  • Link: https://doi.org/10.1007/978-3-662-44851-9_16
  • Department: Department of Computer Science
  • Annotation:
    Security systems for email spam filtering, network intrusion detection, steganalysis, and watermarking, frequently use classifiers to separate malicious behavior from legitimate. Typically, they use a fixed operating point minimizing the expected cost / error. This allows a rational attacker to deliver invisible attacks just below the detection threshold. We model this situation as a non-zero sum normal form game capturing attacker’s expected payoffs for detected and undetected attacks, and detector’s costs for false positives and false negatives computed based on the Receiver Operating Characteristic (ROC) curve of the classifier. The analysis of Nash and Stackelberg equilibria reveals that using a randomized strategy over multiple operating points forces the rational attacker to design less efficient attacks and substantially lowers the expected cost of the detector. We present the equilibrium strategies for sample ROC curves from network intrusion detection system and evaluate the corresponding benefits.

Search in Imperfect Information Games Using Online Monte Carlo Counterfactual Regret Minimization

  • Authors: Lanctot, M., doc. Mgr. Viliam Lisý, MSc., Ph.D., Bowling, M.
  • Publication: Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2014. pp. 34-41.
  • Year: 2014
  • Department: Department of Computer Science
  • Annotation:
    Online search in games has always been a core interest of artificial intelligence. Advances made in search for perfect information games (such as Chess, Checkers, Go, and Backgammon) have led to AI capable of defeating the world's top human experts. Search in imperfect information games (such as Poker, Bridge, and Skat) is significantly more challenging due to the complexities introduced by hidden information. In this paper, we present Online Outcome Sampling (OOS), the first imperfect information search algorithm that is guaranteed to converge to an equilibrium strategy in two-player zero-sum games. We show that OOS avoids common problems encountered by existing search algorithms and we experimentally evaluate its convergence rate and practical performance against benchmark strategies in Liar's Dice and a variant of Goofspiel. We show that unlike with Information Set Monte Carlo Tree Search (ISMCTS) the exploitability of the strategies produced by OOS decreases as the amount of search time increases. In practice, OOS performs as well as ISMCTS in head-to-head play while producing strategies with lower exploitability given the same search time.

Computing Optimal Attack Strategies Using Unconstrained Influence Diagrams

  • Authors: doc. Mgr. Viliam Lisý, MSc., Ph.D., Píbil, R.
  • Publication: Intelligence and Security Informatics. Heidelberg: Springer, 2013. pp. 38-46. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-39692-2.
  • Year: 2013
  • DOI: 10.1007/978-3-642-39693-9_5
  • Link: https://doi.org/10.1007/978-3-642-39693-9_5
  • Department: Department of Computer Science
  • Annotation:
    Attack graphs are a formalism for capturing the most important ways to compromise a system. They are used for evaluating risks and designing appropriate countermeasures. Analysis of attack graphs sometimes requires computing the optimal attack strategy that minimizes the expected cost of the attacker in case of stochastically failing actions. We point out several results in AI literature that are highly relevant to this problem, but remain unnoticed by security literature. We note the problem has been shown to be NP-hard and we present how the problem can be reduced to the problem of solving an unconstrained influence diagram (UID). We use an existing UID solver to assess the scalability of the approach, showing that it can be used to optimally solve attack graphs with up to 20 attack actions.

Convergence of Monte Carlo Tree Search in Simultaneous Move Games

  • Department: Department of Computer Science
  • Annotation:
    We study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves. We present a general template of MCTS algorithms for these games, which can be instantiated by various selection methods. We formally prove that if a selection method is epsilon-Hannan consistent in a matrix game and satisfies additional requirements on exploration, then the MCTS algorithm eventually converges to an approximate Nash equilibrium (NE) of the extensive-form game. We empirically evaluate this claim using regret matching and Exp3 as the selection methods on randomly generated games and empirically selected worst case games. We confirm the formal result and show that additional MCTS variants also converge to approximate NE on the evaluated games.

Double-Oracle Algorithm for Computing an Exact Nash Equilibrium in Zero-Sum Extensive-Form Games

  • Department: Department of Computer Science
  • Annotation:
    We investigate an iterative algorithm for computing an exact Nash equilibrium in two-player zero-sum extensive-form games with imperfect information. The approach uses the sequence-form representation of extensive-form games and the double-oracle algorithmic framework. The main idea is to restrict the game by allowing the players to play only some of the sequences of available actions, then iteratively solve this restricted game, and exploit fast best-response algorithms to add additional sequences to the restricted game for the next iteration. In this paper we (1) extend the sequence-form double-oracle method to be applicable on non-deterministic extensive-form games, (2) present more efficient methods for maintaining valid restricted game and computing best-response sequences, and finally we (3) provide theoretical guarantees of the convergence of the algorithm to a Nash equilibrium. We experimentally evaluate our algorithm on two types of games: a search game on a graph and simplified variants of Poker. The results show significant running-time improvements compared to the previous variant of the double-oracle algorithm, and demonstrate the ability to find an exact solution of much larger games compared to solving full linear program for the complete game.

Using Double-Oracle Method and Serialized Alpha-Beta Search for Pruning in Simultaneous Move Games

  • Department: Department of Computer Science
  • Annotation:
    We focus on solving two-player zero-sum extensive-form games with perfect information and simultaneous moves. In these games, both players fully observe the current state of the game where they simultaneously make a move determining the next state of the game. We solve these games by a novel algorithm that relies on two components: (1) it iteratively solves the games that correspond to a single simultaneous move using a double-oracle method, and (2) it prunes the states of the game using bounds on the sub-game values obtained by the classical Alpha-Beta search on a serialized variant of the game. We experimentally evaluate our algorithm on the Goofspiel card game, a pursuit-evasion game, and randomly generated games. The results show that our novel algorithm typically provides significant running-time improvements and reduction in the number of evaluated nodes compared to the full search algorithm.

Anytime Algorithms for Multi-agent Visibility-based Pursuit-evasion Games

  • Department: Department of Computer Science
  • Annotation:
    We investigate algorithms for playing multi-agent visibility-based pursuit-evasion games. A team of pursuers attempts to maintain visibility contact with an evader who actively avoids tracking. We aim for applicability of the algorithms in real-world scenarios; hence, we impose hard constraints on the run-time of the algorithms and we evaluate them in a simulation model based on a real-world urban area. We compare Monte-Carlo tree search (MCTS) and iterative deepening minimax algorithms running on the information-set tree of the imperfect-information game. The experimental results demonstrate that both methods create comparable good strategies for the pursuer, while the later performs better in creating the evader's strategy.

Extending Security Games to Defenders with Constrained Mobility

  • Department: Department of Computer Science
  • Annotation:
    A number of real-world security scenarios can be cast as a problem of transiting an area guarded by a mobile patroller, where the transiting agent aims to choose its route so as to minimize the probability of encountering the patrolling agent, and vice versa. We model this problem as a two-player zero-sum game on a graph, termed the transit game. In contrast to the existing models of area transit, where one of the players is stationary, we assume both players are mobile. We also explicitly model the limited endurance of the patroller and the notion of a base to which the patroller has to repeatedly return. Noting the prohibitive size of the strategy spaces of both players, we develop single- and double-oracle based algorithms including a novel acceleration scheme, to obtain optimum route selection strategies for both players. We evaluate the developed approach on a range of transit game instances inspired by real-world security problems in the urban and naval security domains.

Game Theoretic Model of Strategic Honeypot Selection in Computer Networks

  • DOI: 10.1007/978-3-642-34266-0_12
  • Link: https://doi.org/10.1007/978-3-642-34266-0_12
  • Department: Department of Computer Science
  • Annotation:
    A honeypot is a decoy computer system used in network security to waste the time and resources of attackers and to analyze their behaviors. While there has been significant research on how to design honeypot systems, less is known about how to use honeypots strategically in network defense. Based on formal deception games, we develop two game-theoretic models that provide insight into how valuable should honeypots look like to maximize the probability that a rational attacker will attack a honeypot. The first model captures a static situation and the second allows attackers to imperfectly probe some of the systems on the network to determine which ones are likely to be real systems (and not honeypots) before launching an attack. We formally analyze the properties of the optimal strategies in the games and provide linear programs for their computation. Finally, we present the optimal solutions for a set of instances of the games and evaluate their quality in comparison to several baselines.

Game-theoretic Approach to Adversarial Plan Recognition

  • DOI: 10.3233/978-1-61499-098-7-546
  • Link: https://doi.org/10.3233/978-1-61499-098-7-546
  • Department: Department of Computer Science
  • Annotation:
    We argue that the problem of adversarial plan recognition, where the observed agent actively tries to avoid detection, should be modeled in the game theoretic framework. We define the problem as an imperfect-information extensive-form game between the observer and the observed agent. We propose a novel algorithm that approximates the optimal solution in the game using Monte-Carlo sampling. The experimental evaluation is performed on a syn- thetic domain inspired by a network security problem. The proposed method produces significantly better results than several simple baselines on a practically large domain.

Iterative Algorithm for Solving Two-player Zero-sum Extensive-form Games with Imperfect Information

  • Department: Department of Computer Science
  • Annotation:
    We develop and evaluate a new exact algorithm for finding Nash equilibria of two-player zero-sum extensive-form games with imperfect information. Our approach is based on the sequence-form representation of the game, and uses an algorithmic framework of double-oracle methods that have been used successfully in other classes of games. The algorithm uses an iterative decomposition, solving restricted games and exploiting fast best-response algorithms to add additional sequences to the game over time. We demonstrate our algorithm on a class of adversarial graph search games motivated by real world border patrolling scenarios. The results indicate that our framework is a promising way to scale up solutions for extensive-form games, reducing both memory and computation time requirements.

Tactical Operations of Multi-Robot Teams in Urban Warfare (Demonstration)

  • Department: Department of Computer Science
  • Annotation:
    With scaling of multi-robot teams deployed in military operations, there is a need to boost autonomy of individual, as well as team behaviors. We developed a feature-rich simulation testbed for experimental evaluation of multi-agent coordination mechanisms applicable in tactical military operations in urban warfare. In particular, we investigated and implemented four approaches including multi-agent mission planning and plan repair, reactive planning for teamwork, patrolling of mobile targets, and tracking of smart targets. Besides the live-system demonstrator, we aim to showcase a scenario engaging a human in a pursuit-evasion game against the algorithms we implemented.

Agent Simulation Against Somali Pirates

  • Department: Department of Cybernetics
  • Annotation:
    Researchers from the Department of Cybernetics Faculty of Electrical Engineering at the Technical University are studying the applicability of artificial intelligence and agent technology in providing security in the maritime domain. The research sponsored U.S. Office of Naval Research (ONR) is to create a detailed model of multi-agent environment, the behavior of pirates in shipping and design of algorithms for optimal planning of randomization safe transit the Gulf of Aden.

Computing Time-Dependent Policies for Patrolling Games with Mobile Targets

  • Department: Department of Cybernetics
  • Annotation:
    We study how a mobile defender should patrol an area to protect multiple valuable targets from being attacked by an attacker. In contrast to existing approaches we allow the targets to move through the area according to a priori known deterministic movement schedules. We represent the patrol area by a graph of arbitrary topology and do not put any restrictions on the movement schedules. We assume the attacker can observe the defender and has full knowledge of the strategy the defender employs. We construct a game-theoretic formulation and seek defender's optimal randomized strategy in a Stackelberg equilibrium of the game. We formulate the computation of the strategy as a mathematical program whose solution corresponds to an optimal time-dependent Markov policy for the defender.

Iterative Game-theoretic Route Selection for Hostile Area Transit and Patrolling

  • Department: Department of Cybernetics
  • Annotation:
    A number of real-world security scenarios can be cast as a problem of transiting an area patrolled by a mobile adversary, where the transiting agent aims to choose its route so as to minimize the probability of encountering the patrolling agent, and vice versa. We model this problem as a two player zero-sum game on a graph, termed the transit game. In contrast to the existing models of area transit, where one of the players is stationary, we assume both players are mobile. We also explicitly model the limited endurance of the patroller and the notion of a base to which the patroller has to repeatedly return. Noting the prohibitive size of the strategy spaces of both players, we employ iterative oracle-based algorithms including a newly proposed accelerated scheme, to obtain optimum route selection strategies for both players. We evaluate the developed approach on a range of transit game instances inspired by real-world security problems in the urban and naval security domains.

Towards Cooperation in Adversarial Search with Confidentiality

  • DOI: 10.1007/978-3-642-23181-0_24
  • Link: https://doi.org/10.1007/978-3-642-23181-0_24
  • Department: Department of Cybernetics
  • Annotation:
    We investigate the problem of cooperation of self-interested agents with respect to the confidentiality of their plans and without a presence of any third-party mediator. We base our approach on non-zero-sum n-player games in the extensive form focusing on two main goals: (1) the analysis of the maximal improvement of the utility value that an agent can gain by negotiation, and (2) the dependence of this improvement on the basic characteristics of the game -- the number of agents, size of the branching factor, and correlation of the utility functions. We provide theoretical definitions together with experimental results evaluated on games abstracting real-world scenarios.

Adversarial Planning for Large Multi-agent Simulations

  • Authors: doc. Mgr. Viliam Lisý, MSc., Ph.D.,
  • Publication: AAMAS 2010: Proceedings of the Ninth International Conference on Autonomous Agents and Multi Agent Systems. Dublin: Computer Science Press, 2010, pp. 1665-1666. ISBN 978-0-9826571-1-9.
  • Year: 2010
  • Department: Department of Cybernetics
  • Annotation:
    We investigate planning for self-interested agents in large multi-agent simulations. We present two heuristic algorithms that exploit different domain-specific properties in order to find high quality plans with reduced amount of computations. We further suggest finding a formal framework for describing these properties and complementing our previous mostly experimental results by formal guarantees.

Affective Man-Machine Interface: Unveiling Human Emotions through Biosignals

  • Authors: van den Broek, E.L., doc. Mgr. Viliam Lisý, MSc., Ph.D., Westerink, J.H.D.M., Schut, M.H., Tuinenbreijer, K.
  • Publication: Biomedical Engineering Systems and Technologies International Joint Conference, BIOSTEC 2009 Porto, Portugal, January 14-17, 2009, Revised Selected Papers. Heidelberg: Springer, 2010. p. 21-47. Communications in Computer and Information Science. ISSN 1865-0929. ISBN 978-3-642-11720-6.
  • Year: 2010
  • DOI: 10.1007/978-3-642-11721-3_2
  • Link: https://doi.org/10.1007/978-3-642-11721-3_2
  • Department: Department of Cybernetics
  • Annotation:
    Humans exhibit an electrical profile altered through various psychological and physiological processes, which can be measured through biosignals; e.g., electromyography (EMG) and electrodermal activity (EDA). These biosignals reveal our emotions and can serve as an advanced man-machine interface (MMI) for empathic consumer products. This chapter starts with an introduction on biosignals for emotion detection. Next, a state-of-the-art review is presented on automatic emotion classification. Moreover, guidelines are presented for affective MMI. Subsequently, a research is presented that explores the use of EDA and three facial EMG signals to determine neutral, positive, negative, and mixed emotions, using recordings of 21 people. A range of techniques is tested, which resulted in a generic framework for automated emotion classification with up to 61.31% correct classification of the four emotion classes.

Agent Subset Adversarial Search for Complex Non-cooperative Domains

  • DOI: 10.1109/ITW.2010.5593351
  • Link: https://doi.org/10.1109/ITW.2010.5593351
  • Department: Department of Cybernetics
  • Annotation:
    We investigate reduction of the complexity of a multi-agent adversarial search in the domain of n-player games. We describe a method that decomposes the game into a set of smaller overlapping sub-games, solves each sub-game separately, and then combines the results into a global solution. This way, the exponential dependence of the adversarial search complexity on the number of agents can be reduced to polynomial. Still, the proposed method performs well in the domains with sparse agents' interactions. The method can be used with a generic adversarial search algorithm. For the experimental evaluation, we implement it on top of an existing adversarial search algorithm designed for complex domains and we evaluate its performance on a game, which is a model of a humanitarian relief operation in conflict environment. The results show that the method often finds the same solutions as the complete search, but the search efficiency is significantly improved.

Deception in Networks of Mobile Sensing Agents

  • Department: Department of Cybernetics
  • Annotation:
    In this study we consider deception by an adversary, which is relevant in many (military) applications. The adversary is expected to use deception to prevent the sensor team from performing its tasks. We employ a game theoretic model to analyze the expected strategy of the adversary and find the best response. We represent a Mobile Sensor Team problem using the Distributed Constraint Optimization Problem (DCOP) framework. We propose an optimal method for the selection of a position of a single agent facing a deceptive adversary. This method serves as a heuristic for agents to select their position in a full scale problem with multiple agents in a large area. Our empirical study demonstrates the success of our model as compared with existing models in the presence of deceptions.

Goal-Based Game Tree Search for Complex Domains

  • DOI: 10.1007/978-3-642-11819-7_8
  • Link: https://doi.org/10.1007/978-3-642-11819-7_8
  • Department: Department of Cybernetics
  • Annotation:
    We present a novel approach to reducing adversarial search space by employing background knowledge represented in the form of higher-level goals that players tend to pursue in the game. The algorithm is derived from a simultaneous-move modification of the max n algorithm by limiting the search to those branches of the game tree that are consistent with pursuing player's goals. The algorithm has been tested on a real-world-based scenario modelled as a large-scale asymmetric game. The experimental results obtained indicate the ability of the goal-based heuristic to reduce the search space to a manageable level even in complex domains while maintaining the high quality of resulting strategies.

Planning and Decision-Making in Complex Dynamic Adversarial Domains

  • Authors: doc. Mgr. Viliam Lisý, MSc., Ph.D.,
  • Publication: Workshop 2010. Praha: České vysoké učení technické v Praze, 2010, pp. 94-95. CTU Reports. ISBN 978-80-01-04513-8.
  • Year: 2010
  • Department: Department of Cybernetics
  • Annotation:
    We briefly introduce the main research topics in multi-agent planning and identify their weaknesses in adversarial settings. The main problem is that the conflicting goals and lack of trust and coordination among the agents causes huge computational complexity of the problem. We introduce two methods to deal with the complexity. The first is incorporating procedural background knowledge to adversarial search and the second is decomposition of the full problem to a set of smaller problems, solving them separately and the aggregating the results to global solution. Other important issues in adversarial planning are partial information about the state of the world, its sharing among the players and its intentional manipulation by opponents ? deception. We summarize our achievements in dealing with these issues and refer to other sources describing our research.

Adversarial Search with Procedural Knowledge Heuristic

  • Department: Department of Cybernetics
  • Annotation:
    We introduce an adversarial planning algorithm based on game tree search, which is applicable in large-scale multiplayer domains. In order to tackle the scalability issues of game tree search, the algorithm utilizes procedural knowledge capturing how individual players tend to achieve their goals in the domain; the information is used to limit the search only to the part of the game tree that is consistent with pursuing players' goals. We impose no specific requirements on the format of the procedural knowledge; any programming language or agent specification paradigm can be employed. We evaluate the algorithm both theoretically and empirically, confirming that the proposed approach can lead to a substantial search reduction with only a minor negative impact on the quality of produced solutions.

Biosignals as an Advanced Man-Machine Interface

  • Authors: van den Broek, E., doc. Mgr. Viliam Lisý, MSc., Ph.D., Westerink, J.H.D., Schut, M.H., Tuinenbreijer, K.
  • Publication: Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing. Setúbal: Institute for Systems and Technologies of Information, Control and Communication, 2009, pp. IS-15-IS-24. ISBN 978-989-8111-64-7.
  • Year: 2009
  • Department: Department of Cybernetics
  • Annotation:
    As is known for centuries, humans exhibit an electrical profile. This profile is altered through various physiological processes, which can be measured through biosignals; e.g., electromyography (EMG) and electrodermal activity (EDA). These biosignals can reveal our emotions and, as such, can serve as an advanced man-machine interface (MMI) for empathic consumer products. However, such an MMI requires the correct classification of biosignals to emotion classes. This paper explores the use of EDA and three facial EMG signals to determine neutral, positive, negative, and mixed emotions, using recordings of 24 people. A range of techniques is tested, which resulted in a generic framework for automated emotion classification with up to 61.31% correct classification of the four emotion classes, without the need of personal profiles.

Goal-Based Adversarial Search - Searching Game Trees in Complex Domains using Goal-based Heuristic

  • Department: Department of Cybernetics
  • Annotation:
    We present a novel approach to reducing adversarial search space by using background knowledge represented in the form of higher-level goals that players tend to pursue in the game. The algorithm is derived from a simultaneous-move modification of the maxn algorithm by only searching the branches of the game tree that are consistent with pursuing player's goals. The algorithm has been tested on a real-world-based scenario modelled as a large-scale asymmetric game. The experimental results obtained indicate the ability of the goal-based heuristic to reduce the search space to a manageable level even in complex domains while maintaining the high quality of resulting strategies.

Towards Cooperative Predictive Data Mining in Competitive Environments.

  • Department: Department of Cybernetics
  • Annotation:
    We study the problem of predictive data mining in a competitive multi-agent setting, in which each agent is assumed to have some partial knowledge required for correctly classifying a set of unlabelled examples. The agents are self-interested and therefore need to reason about the trade-offs between increasing their classification accuracy by collaborating with other agents and disclosing their private classification knowledge to other agents through such collaboration. We analyze the problem and propose a set of components which can enable cooperation in this otherwise competitive task. These components include measures for quantifying private knowledge disclosure, data-mining models suitable for multi-agent predictive data mining, and a set of strategies by which agents can improve their classification accuracy through collaboration. The overall framework and its individual components are validated on a synthetic experimental domain.

Utility-based Model for Classifying Adversarial Behaviour in Multi-Agent Systems

  • DOI: 10.1109/IMCSIT.2008.4747216
  • Link: https://doi.org/10.1109/IMCSIT.2008.4747216
  • Department: Department of Cybernetics
  • Annotation:
    Interactions and social relationships among agents are an important aspect of multi-agent systems. In this paper, we explore how such relationships and their relation to agent's objectives influence agent's decision-making. Building on the framework of stochastic games, we propose a classification scheme, based on a formally defined concept of interaction stance, for categorizing agent's behaviour as self-interested, altruistic, competitive, cooperative, or adversarial with respect to other agents in the system. We show how the scheme can be employed in defining behavioural norms, capturing social aspects of agent's behaviour and/or in representing social configurations of multiagent systems.

Responsible person Ing. Mgr. Radovan Suk