Lidé

Ing. Gustav Šír, Ph.D.

Všechny publikace

Beating the market with a bad predictive model

  • Autoři: Hubáček, O., Ing. Gustav Šír, Ph.D.,
  • Publikace: International Journal of Forecasting. 2022, 2022 ISSN 1872-8200.
  • Rok: 2022
  • DOI: 10.1016/j.ijforecast.2022.02.001
  • Odkaz: https://doi.org/10.1016/j.ijforecast.2022.02.001
  • Pracoviště: Intelligent Data Analysis
  • Anotace:
    It is a common misconception that in order to make consistent profits as a trader, one needs to possess some extra information leading to an asset value estimation that is more accurate than that reflected by the current market price. While the idea makes intuitive sense and is also well substantiated by the widely popular Kelly criterion, we prove that it is generally possible to make systematic profits with a completely inferior price-predicting model. The key idea is to alter the training objective of the predictive models to explicitly decorrelate them from the market. By doing so, we can exploit inconspicuous biases in the market maker’s pricing, and profit from the inherent advantage of the market taker. We introduce the problem setting throughout the diverse domains of stock trading and sports betting to provide insights into the common underlying properties of profitable predictive models, their connections to standard portfolio optimization strategies, and the commonly overlooked advantage of the market taker. Consequently, we prove the desirability of the decorrelation objective across common market distributions, translate the concept into a practical machine learning setting, and demonstrate its viability with real-world market data.

Deep Learning with Relational Logic Representations

  • Autoři: Ing. Gustav Šír, Ph.D.,
  • Publikace: Amsterdam: IOS Press, 2022. ISSN 1879-8314. ISBN 978-1-64368-343-0.
  • Rok: 2022
  • Pracoviště: Intelligent Data Analysis
  • Anotace:
    Deep learning has been used with great success in a number of diverse applications, ranging from image processing to game playing, and the fast progress of this learning paradigm has even been seen as paving the way towards general artificial intelligence. However, the current deep learning models are still principally limited in many ways. This book, ‘Deep Learning with Relational Logic Representations’, addresses the limited expressiveness of the common tensor-based learning representation used in standard deep learning, by generalizing it to relational representations based in mathematical logic. This is the natural formalism for the relational data omnipresent in the interlinked structures of the Internet and relational databases, as well as for the background knowledge often present in the form of relational rules and constraints. These are impossible to properly exploit with standard neural networks, but the book introduces a new declarative deep relational learning framework called Lifted Relational Neural Networks, which generalizes the standard deep learning models into the relational setting by means of a ‘lifting’ paradigm, known from Statistical Relational Learning. The author explains how this approach allows for effective end-to-end deep learning with relational data and knowledge, introduces several enhancements and optimizations to the framework, and demonstrates its expressiveness with various novel deep relational learning concepts, including efficient generalizations of popular contemporary models, such as Graph Neural Networks. Demonstrating the framework across various learning scenarios and benchmarks, including computational efficiency, the book will be of interest to all those interested in the theory and practice of advancing representations of modern deep learning architectures.

Forty years of score-based soccer match outcome prediction: an experimental review

  • DOI: 10.1093/imaman/dpab029
  • Odkaz: https://doi.org/10.1093/imaman/dpab029
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We investigate the state-of-the-art in score-based soccer match outcome modelling to identify the top-performing methods across diverse classes of existing approaches to the problem. Namely, we bring together various statistical methods based on Poisson and Weibull distributions and several general ranking algorithms (Elo, Steph ratings, Gaussian-OD ratings) as well as domain-specific rating systems (Berrar ratings, pi-ratings). We review, reimplement and experimentally compare these diverse competitors altogether on the largest database of soccer results available to identify true leaders. Our results reveal that the individual predictions, as well as the overall performances, are very similar across the top models tested, likely suggesting the limits of this generic approach to score-based match outcome modelling. No study of a similar scale has previously been done.

Learning with Molecules beyond Graph Neural Networks

  • Pracoviště: Intelligent Data Analysis
  • Anotace:
    We demonstrate a deep learning framework which is inherently based in the highly expressive language of relational logic, enabling to, among other things, capture arbitrarily complex graph structures. We show how Graph Neural Networks and similar models can be easily covered in the framework by specifying the underlying propagation rules in the relational logic. The declarative nature of the used language then allows to easily modify and extend the existing propagation schemes into more complex structures, such as atom rings in molecules, which we choose for a short demonstration in this work

Beyond graph neural networks with lifted relational neural networks

  • DOI: 10.1007/s10994-021-06017-3
  • Odkaz: https://doi.org/10.1007/s10994-021-06017-3
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We introduce a declarative differentiable programming framework, based on the language of Lifted Relational Neural Networks, where small parameterized logic programs are used to encode deep relational learning scenarios through the underlying symmetries. When presented with relational data, such as various forms of graphs, the logic program interpreter dynamically unfolds differentiable computation graphs to be used for the program parameter optimization by standard means. Following from the declarative, relational logic-based encoding, this results into a unified representation of a wide range of neural models in the form of compact and elegant learning programs, in contrast to the existing procedural approaches operating directly on the computational graph level. We illustrate how this idea can be used for a concise encoding of existing advanced neural architectures, with the main focus on Graph Neural Networks (GNNs). Importantly, using the framework, we also show how the contemporary GNN models can be easily extended towards higher expressiveness in various ways. In the experiments, we demonstrate correctness and computation efficiency through comparison against specialized GNN frameworks, while shedding some light on the learning performance of the existing GNN models.

Lossless Compression of Structured Convolutional Models via Lifting

  • Pracoviště: Intelligent Data Analysis
  • Anotace:
    Lifting is an efficient technique to scale up graphical models generalized to relational domains by exploiting the underlying symmetries. Concurrently, neural models are continuously expanding from grid-like tensor data into structured representations, such as various attributed graphs and relational databases. To address the irregular structure of the data, the models typically extrapolate on the idea of convolution, effectively introducing parameter sharing in their, dynamically unfolded, computation graphs. The computation graphs themselves then reflect the symmetries of the underlying data, similarly to the lifted graphical models. Inspired by lifting, we introduce a simple and efficient technique to detect the symmetries and compress the neural models without loss of any information. We demonstrate through experiments that such compression can lead to significant speedups of structured convolutional models, such as various Graph Neural Networks, across various tasks, such as molecule classification and knowledge-base completion.

Optimal sports betting strategies in practice: an experimental review

  • DOI: 10.1093/imaman/dpaa029
  • Odkaz: https://doi.org/10.1093/imaman/dpaa029
  • Pracoviště: Intelligent Data Analysis
  • Anotace:
    We investigate the most popular approaches to the problem of sports betting investment based on modern portfolio theory and the Kelly criterion. We define the problem setting, the formal investment strategies and review their common modifications used in practice. The underlying purpose of the reviewed modifications is to mitigate the additional risk stemming from the unrealistic mathematical assumptions of the formal strategies. We test the resulting methods using a unified evaluation protocol for three sports: horse racing, basketball and soccer. The results show the practical necessity of the additional risk-control methods and demonstrate their individual benefits. Particularly, an adaptive variant of the popular ‘fractional Kelly’ method is a very suitable choice across a wide range of settings.

Learning with Molecules beyond Graph Neural Networks

  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    In this paper we demonstrate a deep learning framework which is inherently based in the highly expressive language of relational logic, enabling to, among other things, capture arbitrarily complex graph structures. We show how GNNs and similar models can be easily covered in the framework by specifying the underlying propagation rules in the relational logic. The declarative nature of the used language then allows to easily modify and extend the propagation schemes into complex structures, such as the molecular rings which we choose for a short demonstration in this paper.

Deep Learning from Spatial Relations for Soccer Pass Prediction

  • DOI: 10.1007/978-3-030-17274-9_14
  • Odkaz: https://doi.org/10.1007/978-3-030-17274-9_14
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We propose a convolutional architecture for learning representations over spatial relations in the game of soccer, with the goal to predict individual passes between players, as a submission to the prediction challenge organized for the 5th Workshop on Machine Learning and Data Mining for Sports Analytics. The goal of the challenge was to predict the receiver of a pass given location of the sender and all other players. From each soccer situation, we extract spatial relations between the players and a few key locations on the field, which are then hierarchically aggregated within the neural architecture designed to extract possibly complex gameplay patterns stemming from these simple relations. The use of convolutions then allows to efficiently capture the various regularities that are inherent to the game. In the experiments, we show very promising performance of the method.

Deep Learning with Relational Logic Representations

  • Autoři: Ing. Gustav Šír, Ph.D.,
  • Publikace: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2019. p. 6462-6463. ISSN 1045-0823. ISBN 978-0-9992411-4-1.
  • Rok: 2019
  • DOI: 10.24963/ijcai.2019/920
  • Odkaz: https://doi.org/10.24963/ijcai.2019/920
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    Despite their significant success, all the existing deep neural architectures based on static computational graphs processing fixed tensor representations necessarily face fundamental limitations when presented with dynamically sized and structured data. Examples of these are sparse multi-relational structures present everywhere from biological networks and complex knowledge hyper-graphs to logical theories. Likewise, given the cryptic nature of generalization and representation learning in neural networks, potential integration with the sheer amounts of existing symbolic abstractions present in human knowledge remains highly problematic. Here, we argue that these abilities, naturally present in symbolic approaches based on the expressive power of relational logic, are necessary to be adopted for further progress of neural networks, and present a well founded learning framework for integration of deep and symbolic approaches based on the lifted modelling paradigm.

Efficient Extraction of Network Event Types from NetFlows

  • DOI: 10.1155/2019/8954914
  • Odkaz: https://doi.org/10.1155/2019/8954914
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    To perform sophisticated traffic analysis, such as intrusion detection, network monitoring tools firstly need to extract higher-level information from lower-level data by reconstructing events and activities from as primitive information as individual network packets or traffic flows. Aggregating communication data into meaningful entities is an open problem and existing, typically clustering-based, solutions are often highly suboptimal, producing results that may misinterpret the extracted information and consequently miss many network events. We propose a novel method for the extraction of various predefined types of network events from raw network flow data. The new method is based on analysis of computational properties of the event types as prescribed by their attributes in a given descriptive language. The corresponding events are then extracted with a supreme recall as compared to a respective event extraction part of an in-production intrusion detection system Camnep.

Exploiting sports-betting market using machine learning

  • DOI: 10.1016/j.ijforecast.2019.01.001
  • Odkaz: https://doi.org/10.1016/j.ijforecast.2019.01.001
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We introduce a forecasting system designed to profit from sports-betting market using machine learning. We contribute three main novel ingredients. First, previous attempts to learn models for match-outcome prediction maximized the model's predictive accuracy as the single criterion. Unlike these approaches, we also reduce the model's correlation with the bookmaker's predictions available through the published odds. We show that such an optimized model allows for better profit generation, and the approach is thus a way to `exploit' the bookmaker. The second novelty is in the application of convolutional neural networks for match outcome prediction. The convolution layer enables to leverage a vast number of player-related statistics on its input. Thirdly, we adopt elements of the modern portfolio theory to design a strategy for bet distribution according to the odds and model predictions, trading off profit expectation and variance optimally. These three ingredients combine towards a betting method yielding positive cumulative profits in experiments with NBA data from seasons 2007--2014 systematically, as opposed to alternative methods tested.

Learning to predict soccer results from relational data with gradient boosted trees

  • DOI: 10.1007/s10994-018-5704-6
  • Odkaz: https://doi.org/10.1007/s10994-018-5704-6
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We describe our winning solution to the 2017’s Soccer Prediction Challenge organized in conjunction with the MLJ’s special issue on Machine Learning for Soccer. The goal of the challenge was to predict outcomes of future matches within a selected time-frame from different leagues over the world. A dataset of over 200,000 past match outcomes was provided to the contestants. We experimented with both relational and feature-based methods to learn predictive models from the provided data. We employed relevant latent variables computable from the data, namely so called pi-ratings and also a rating based on the PageRank method. A method based on manually constructed features and the gradient boosted tree algorithm performed best on both the validation set and the challenge test set. We also discuss the validity of the assumption that probability predictions on the three ordinal match outcomes should be monotone, underlying the RPS measure of prediction quality.

Score-based Soccer Match Outcome Modeling - an Experimental Review

  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    In this experimental work, we propose to investigate the state-of-the-art in score-basedsoccer match outcome prediction modeling to identify the top-performing methods acrossthe diverse classes of existing approaches to the problem. Namely, we bring together sta-tistical methods based on Poisson distribution, a general ranking algorithm (Elo), domain-specific rating system (pi-ratings) and a graph-based approach to the problem (PageRank).We experimentally compare these diverse competitors altogether on a large database ofsoccer results to identify the true leaders in the domain.

Sports betting strategies: an experimental review

  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We investigate the problem of optimal wealth allocation over predictive sports mar-ket’s opportunities. We analyze the problem across diverse settings, utility targets, andthe notion of optimality itself. We review existing literature to identify the most prominentapproaches coming from the diverse sport and economic views on the problem, and providesome practical perspectives. Namely, we focus on the provably optimal geometric meanpolicy, typically referred to as the Kelly criterion, and Modern Portfolio Theory based ap-proaches leveraging utility theory. From the joint perspective of decision theory, we discusstheir unique properties, assumptions and, importantly, investigate effective heuristics andpractical techniques to tackle their key common challenges, particularly the problem ofuncertainty in the outcome probability estimates. Finally, we verify our findings on a largedataset of soccer records.

Lifted Relational Neural Networks: Efficient Learning of Latent Relational Structures

  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We propose a method to combine the interpretability and expressive power of firstorder logic with the effectiveness of neural network learning. In particular, we introduce a lifted framework in which first-order rules are used to describe the structure of a given problem setting. These rules are then used as a template for constructing a number of neural networks, one for each training and testing example. As the different networks corresponding to different examples share their weights, these weights can be efficiently learned using stochastic gradient descent. Our framework provides a flexible way for implementing and combining a wide variety of modelling constructs. In particular, the use of first-order logic allows for a declarative specification of latent relational structures, which can then be efficiently discovered in a given data set using neural network learning. Experiments on 78 relational learning benchmarks clearly demonstrate the effectiveness of the framework.

Lifted Relational Team Embeddings for Predictive Sports Analytics

  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We investigate the use of relational learning in domain of predictive sports analytics, for which we propose a team embedding concept expressed in the language of Lifted relational neural networks, a framework for learning of latent relational structures. On a large dataset of soccer results, we compare different relational learners against strong current methods from the domain to show some very promising results of the relational approach when combined with embedding learning

Pruning Hypothesis Spaces Using Learned Domain Theories

  • DOI: 10.1007/978-3-319-78090-0_11
  • Odkaz: https://doi.org/10.1007/978-3-319-78090-0_11
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    We present a method to prune hypothesis spaces in the con- text of inductive logic programming. The main strategy of our method consists in removing hypotheses that are equivalent to already consid- ered hypotheses. The distinguishing feature of our method is that we use learned domain theories to check for equivalence, in contrast to existing approaches which only prune isomorphic hypotheses. Specifically, we use such learned domain theories to saturate hypotheses and then check if these saturations are isomorphic. While conceptually simple, we exper- imentally show that the resulting pruning strategy can be surprisingly effective in reducing both computation time and memory consumption when searching for long clauses, compared to approaches that only con- sider isomorphism.

Stacked Structure Learning for Lifted Relational Neural Networks

  • DOI: 10.1007/978-3-319-78090-0_10
  • Odkaz: https://doi.org/10.1007/978-3-319-78090-0_10
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    Lifted Relational Neural Networks (LRNNs) describe relational domains using weighted first-order rules which act as templates for constructing feed-forward neural networks. While previous work has shown that using LRNNs can lead to state-of-the-art results in various ILP tasks, these results depended on hand-crafted rules. In this paper, we extend the framework of LRNNs with structure learning, thus enabling a fully automated learning process. Similarly to many ILP methods, our structure learning algorithm proceeds in an iterative fashion by top-down searching through the hypothesis space of all possible Horn clauses, considering the predicates that occur in the training examples as well as invented soft concepts entailed by the best weighted rules found so far. In the experiments, we demonstrate the ability to automatically induce useful hierarchical soft concepts leading to deep LRNNs with a competitive predictive power.

Learning Predictive Categories Using Lifted Relational Neural Networks

  • DOI: 10.1007/978-3-319-63342-8_9
  • Odkaz: https://doi.org/10.1007/978-3-319-63342-8_9
  • Pracoviště: Katedra počítačů, Intelligent Data Analysis
  • Anotace:
    Lifted relational neural networks (LRNNs) are a flexible neuralsymbolic framework based on the idea of lifted modelling. In this paper we show how LRNNs can be easily used to declaratively specify and solve a learning problem in which latent categories of entities and properties need to be jointly induced.

Lifted Relational Neural Networks

  • Pracoviště: Katedra počítačů
  • Anotace:
    We propose a method combining relational-logic representations with neural network learning. A general lifted architecture, possibly reflecting some background domain knowledge, is described through relational rules which may be handcrafted or learned. The relational rule-set serves as a template for unfolding possibly deep neural networks whose structures also reflect the structures of given training or testing relational examples. Different networks corresponding to different examples share their weights, which co-evolve during training by stochastic gradient descent algorithm. The framework allows for hierarchical relational modeling constructs and learning of latent relational concepts through shared hidden layers weights corresponding to the rules. Discovery of notable relational concepts and experiments on 78 relational learning benchmarks demonstrate favorable performance of the method.

Dynamic System Modeling of Evolutionary Algorithms

  • DOI: 10.1145/2811411.2811517
  • Odkaz: https://doi.org/10.1145/2811411.2811517
  • Pracoviště: Katedra kybernetiky, Katedra počítačů
  • Anotace:
    Evolutionary algorithms are population-based, metaheuristic, black-box optimization techniques from the wider family of evolutionary computation. Optimization algorithms within this family are often based on similar principles and routines inspired by biological evolution. Due to their robustness, the scope of their application is broad and varies from physical engineering to software design problems. Despite sharing similar principles based in common biological inspiration, these algorithms themselves are typically viewed as black-box program routines by the end user, without a deeper insight into the underlying optimization process. We believe that shedding some light into the underlying routines of evolutionary computation algorithms can make them more accessible to wider engineering public. In this paper, we formulate the evolutionary optimization process as a dynamic system simulation, and provide means to prototype evolutionary optimization routines in a visually comprehensible framework. The framework enables engineers to follow the same dynamic system modeling paradigm, they typically use for representation of their optimization problems, to also create the desired evolutionary optimizers themselves. Instantiation of the framework in a MatlabSimulink library practically results in graphical programming of evolutionary optimizers based on data-flow principles used for dynamic system modeling within the Simulink environment. We illustrate the efficiency of visual representation in clarifying the underlying concepts on executable flow-charts of respective evolutionary optimizers and demonstrate features and potential of the framework on selected engineering benchmark applications.

Learning to detect network intrusion from a few labeled events and background traffic

  • DOI: 10.1007/978-3-319-20034-7_9
  • Odkaz: https://doi.org/10.1007/978-3-319-20034-7_9
  • Pracoviště: Katedra počítačů
  • Anotace:
    Intrusion detection systems (IDS) analyse network traffic data with the goal to reveal malicious activities and incidents. A general problem with learning within this domain is a lack of relevant ground truth data, i.e. real attacks, capturing malicious behaviors in their full variety. Most of existing solutions thus, up to a certain level, rely on rules designed by network domain experts. Although there are advantages to the use of rules, they lack the basic ability of adapting to traffic data. As a result, we propose an ensemble tree bagging classifier, capable of learning from an extremely small number of true attack representatives, and demonstrate that, incorporating a general background traffic, we are able to generalize from those few representatives to achieve competitive results to the expert designed rules used in existing IDS Camnep.

Visual Data-Flow Framework of Evolutionary Computation

  • DOI: 10.1145/2811411.2811517
  • Odkaz: https://doi.org/10.1145/2811411.2811517
  • Pracoviště: Katedra kybernetiky, Katedra počítačů
  • Anotace:
    Visual representation of information, allowing to quickly communicate and share ideas, forms an important part of scientific and engineering progress, with applications varying from physics to software design. Engineers naturally utilize graphs and flowcharts to clarify concepts and prototype their applications. Traditionally, wide variety of engineering applications from civil to control engineering can be formulated in the form of an optimization problem. For some of the most challenging optimization problems, evolutionary algorithms and other population based iterative optimizers were proven useful in finding high quality solutions. In this paper we present a new data-flow framework to integrate these two worlds of visual representation and engineering optimization - textit{VisualEA} - a Matlab Simulink library for visual programming of evolutionary optimizers under the paradigm of dynamic systems, and demonstrate its potential on selected engineering applications.

VisualEA - Visual design of Evolutionary Optimizers for Engineering Applications

  • Autoři: Ing. Gustav Šír, Ph.D.,
  • Publikace: Proceedings of the 19th International Scientific Student Conferenece POSTER 2015. Praha: Czech Technical University in Prague, 2015, pp. 1-7. ISBN 978-80-01-05499-4.
  • Rok: 2015
  • Pracoviště: Katedra počítačů
  • Anotace:
    Visual representation of information, allowing to quickly communicate and share ideas, forms an important part of scientific and engineering progress, with applications varying from physics to software design. Engineers naturally utilize graphs and flowcharts to clarify concepts and prototype their applications. Traditionally, wide variety of engineering applications from civil to control engineering can be formulated in the form of an optimization problem. For some of the most challenging optimization problems, evolutionary algorithms and other population based iterative optimizers were proven useful in finding high quality solutions. In this paper we present a new framework to integrate these two worlds of visual representation and engineering optimization - textit{VisualEA} - a Matlab Simulink library for visual prototyping of evolutionary optimizers under the paradigm of dynamic systems, and demonstrate it's potential on selected engineering applications.

Predicting Top-k Trends on Twitter using Graphlets and Time Features

  • Pracoviště: Katedra počítačů
  • Anotace:
    We introduce a novel method for predicting trending keywords on Twitter. This new method exploits topology of the studied parts of the social network. It is based on a combination of graphlet spectra and so-called time features. We show experimentally that using graphlets and time features is bene cial for the accuracy of prediction.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk