Lidé

doc. Ing. Václav Šmídl, Ph.D.

Všechny publikace

Sum-Product-Set Networks: Deep Tractable Models for Tree-Structured Graphs

  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Daily internet communication relies heavily on tree-structured graphs, embodied by popular data formats such as XML and JSON. However, many recent generative (probabilistic) models utilize neural networks to learn a probability distribution over undirected cyclic graphs. This assumption of a generic graph structure brings various computational challenges, and, more importantly, the presence of non-linearities in neural networks does not permit tractable probabilistic inference. We address these problems by proposing sum-product-set networks, an extension of probabilistic circuits from unstructured tensor data to tree-structured graph data. To this end, we use random finite sets to reflect a variable number of nodes and edges in the graph and to allow for exact and efficient inference. We demonstrate that our tractable model performs comparably to various intractable models based on neural networks.

Batch Active Learning for Text Classification and Sentiment Analysis

  • Autoři: Sahan, M., doc. Ing. Václav Šmídl, Ph.D., Ing. Radek Mařík, CSc.,
  • Publikace: CCRIS '22: Proceedings of the 2022 3rd International Conference on Control, Robotics and Intelligent System. New York: Association for Computing Machinery, 2022. p. 111-116. ISBN 978-1-4503-9685-1.
  • Rok: 2022
  • DOI: 10.1145/3562007.3562028
  • Odkaz: https://doi.org/10.1145/3562007.3562028
  • Pracoviště: Katedra telekomunikační techniky, Centrum umělé inteligence
  • Anotace:
    Supervised learning of classifiers for text classification and sentiment analysis relies on the availability of labels that may be either difficult or expensive to obtain. A standard procedure is to add labels to the training dataset sequentially by querying an annotator until the model reaches a satisfactory performance. Active learning is a process that optimizes unlabeled data records selection for which the knowledge of the label would bring the highest discriminability of the dataset. Batch active learning is a generalization of a single instance active learning by selecting a batch of documents for labeling. This task is much more demanding because plenty of different factors come into consideration (i. e. batch size, batch evaluation, etc.). In this paper, we provide a large scale study by decomposing the existing algorithms into building blocks and systematically comparing meaningful combinations of these blocks with a subsequent evaluation on different text datasets. While each block is known (warm start weights initialization, Dropout MC, entropy sampling, etc.), many of their combinations like Bayesian strategies with agglomerative clustering are first proposed in our paper with excellent performance. Particularly, our extension of the warm start method to batch active learning is among the top performing strategies on all datasets. We studied the effect of this proposal comparing the outcomes of varying distinct factors of an active learning algorithm. Some of these factors include initialization of the algorithm, uncertainty representation, acquisition function, and batch selection strategy. Further, various combinations of these are tested on selected NLP problems with documents encoded using RoBERTa embeddings. Datasets cover context integrity (Gibberish Wackerow), fake news detection (Kaggle Fake News Detection), categorization of short texts by emotional context (Twitter Sentiment140), and sentiment classification (Amazon Reviews). Ultimately, we show that each of the active learning factors has advantages for certain datasets or experimental settings.

Comparison of Anomaly Detectors: Context Matters

  • DOI: 10.1109/TNNLS.2021.3116269
  • Odkaz: https://doi.org/10.1109/TNNLS.2021.3116269
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Deep generative models are challenging the classical methods in the field of anomaly detection nowadays. Every newly published method provides evidence of outperforming its predecessors, sometimes with contradictory results. The objective of this article is twofold: to compare anomaly detection methods of various paradigms with a focus on deep generative models and identification of sources of variability that can yield different results. The methods were compared on popular tabular and image datasets. We identified that the main sources of variability are the experimental conditions: 1) the type of dataset (tabular or image) and the nature of anomalies (statistical or semantic) and 2) strategy of selection of hyperparameters, especially the number of available anomalies in the validation set. Methods perform differently in different contexts, i.e., under a different combination of experimental conditions together with computational time. This explains the variability of the previous results and highlights the importance of careful specification of the context in the publication of a new method. All our code and results are available for download.

General framework for binary classification on top samples

  • DOI: 10.1080/10556788.2021.1965601
  • Odkaz: https://doi.org/10.1080/10556788.2021.1965601
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Many binary classification problems minimize misclassification above (or below) a threshold. We show that instances of ranking problems, accuracy at the top, or hypothesis testing may be written in this form. We propose a general framework to handle these classes of problems and show which formulations (both known and newly proposed) fall into this framework. We provide a theoretical analysis of this framework and mention selected possible pitfalls the formulations may encounter. We show the convergence of the stochastic gradient descent for selected formulations even though the gradient estimate is inherently biased. We suggest several numerical improvements, including the implicit derivative and stochastic gradient descent. We provide an extensive numerical study.

Reducing the cost of fitting mixture models via stochastic sampling

  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Traditional methods for unsupervised learning of finite mixture models require to evaluate the likelihood of all components of the mixture. This quickly becomes prohibitive when the components are abundant or expensive to compute. Therefore, we propose to apply a combination of the expectation maximization and the Metropolis-Hastings algorithm to evaluate only a small number of, stochastically sampled, components, thus substantially reducing the computational cost. The Markov chain of component assignments is sequentially generated across the algorithm's iterations, having a non-stationary target distribution whose parameters vary via a gradient-descent scheme. We put emphasis on generality of our method, equipping it with the ability to train mixture models which involve complex, and possibly nonlinear, transformations. The performance of our method is illustrated on mixtures of normalizing flows.

Semi-supervised deep networks for plasma state identification

  • DOI: 10.1088/1361-6587/ac9926
  • Odkaz: https://doi.org/10.1088/1361-6587/ac9926
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Correct and timely detection of plasma confinement regimes and edge localized modes (ELMs) is important for improving the operation of tokamaks. Existing machine learning approaches detect these regimes as a form of post-processing of experimental data. Moreover, they are typically trained on a large dataset of tens of labeled discharges, which may be costly to build. We investigate the ability of current machine learning approaches to detect the confinement regime and ELMs with the smallest possible delay after the latest measurement. We also demonstrate that including unlabeled data into the training process can improve the results in a situation where only a limited set of reliable labels is available. All training and validation is performed on data from the COMPASS tokamak. The InceptionTime architecture trained using a semi-supervised approach was found to be the most accurate method based on the set of tested variants. It is able to achieve good overall accuracy of the regime classification at the time instant of 100 μs delayed behind the latest data record. We also evaluate the capability of the model to correctly predict class transitions. While ELM occurrence can be detected with a tolerance smaller than 50 μs, detection of the confinement regime transition is more demanding and it was successful with 2 ms tolerance. Sensitivity studies to different values of model parameters are provided. We believe that the achieved accuracy is acceptable in practice and the method could be used in real-time operation.

Active Learning for Text Classification and Fake News Detection

  • DOI: 10.1109/ISCSIC54682.2021.00027
  • Odkaz: https://doi.org/10.1109/ISCSIC54682.2021.00027
  • Pracoviště: Katedra telekomunikační techniky, Centrum umělé inteligence
  • Anotace:
    Supervised classification of texts relies on the availability of reliable class labels for the training data. However, the process of collecting data labels can be complex and costly. A standard procedure is to add labels sequentially by querying an annotator until reaching satisfactory performance. Active learning is a process of selecting unlabeled data records for which the knowledge of the label would bring the highest discriminability of the dataset. In this paper, we provide a comparative study of various active learning strategies for different embeddings of the text on various datasets. We focus on Bayesian active learning methods that are used due to their ability to represent the uncertainty of the classification procedure. We compare three types of uncertainty representation: i) SGLD, ii) Dropout, and iii) deep ensembles. The latter two methods in cold- and warm-start versions. The texts were embedded using Fast Text, LASER, and RoBERTa encoding techniques. The methods are tested on two types of datasets, text categorization (Kaggle News Category and Twitter Sentiment140 dataset) and fake news detection (Kaggle Fake News and Fake News Detection datasets). We show that the conventional dropout Monte Carlo approach provides good results for the majority of the tasks. The ensemble methods provide more accurate representation of uncertainty that allows to keep the pace of learning of a complicated problem for the growing number of requests, outperforming the dropout in the long run. However, for the majority of the datasets the active strategy using Dropout MC and Deep Ensembles achieved almost perfect performance even for a very low number of requests. The best results were obtained for the most recent embeddings RoBERTa

Detection of Alfven Eigenmodes on COMPASS with Generative Neural Networks

  • DOI: 10.1080/15361055.2020.1820805
  • Odkaz: https://doi.org/10.1080/15361055.2020.1820805
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Chirping Alfvén eigenmodes (AE) were observed at the COMPASS tokamak. They are believed to be driven by runaway electrons (RE) and as such, they provide a unique opportunity to study physics of non-linear interaction between RE and electromagnetic instabilities, including important topics of RE mitigation and losses. On COMPASS, they can be detected from spectrograms of certain magnetic probes. So far, their detection required a lot of manual effort since they occur rarely. We strive to automate this process using machine learning techniques based on generative neural networks. We present two different models that are trained using a smaller, manually labeled database and a larger unlabeled database from COMPASS experiments. On a number of experiments, we demonstrate that our approach is a viable option for automated detection of rare instabilities in tokamak plasma.

Neural Power Units

  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Conventional Neural Networks can approximate simple arithmetic operations, but fail to generalize beyond the range of numbers that were seen during training. Neural Arithmetic Units aim to overcome this difficulty, but current arithmetic units are either limited to operate on positive numbers or can only represent a subset of arithmetic operations. We introduce the Neural Power Unit (NPU) that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer. The NPU thus fixes the shortcomings of existing arithmetic units and extends their expressivity. We achieve this by using complex arithmetic without requiring a conversion of the network to complex numbers. A simplification of the unit to the RealNPU yields a highly transparent model. We show that the NPUs outperform their competitors in terms of accuracy and sparsity on artificial arithmetic datasets, and that the RealNPU can discover the governing equations of a dynamical system only from data.

Sum-Product-Transform Networks: Exploiting Symmetries using Invertible Transformations

  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    We propose Sum-Product-Transform Networks (SPTN), an extension of sum-product networks that uses invertible transformations as additional internal nodes. The type and placement of transformations determine properties of the resulting SPTN with many interesting special cases. Importantly, SPTN with Gaussian leaves and affine transformations pose the same inference task tractable that can be computed efficiently in SPNs. We propose to store and optimize affine transformations in their SVD decompositions using an efficient parametrization of unitary matrices by a set of Givens rotations. Last but not least, we demonstrate that G-SPTNs pushes the state-of-the-art on the density estimation task on used datasets.

Rodent: Relevance determination in ODE

  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    From a set of observed trajectories of a partially observed system, we aim to learnits underlying (physical) process without having to make too many assumptionsabout the generating model. We start with a very general, over-parameterizedordinary differential equation(ODE) of orderNand learn the minimal complexityof the model, by which we mean both the order of the ODE as well as the minimumnumber of non-zero parameters that are needed to solve the problem. The minimalcomplexity is found by combining theVariational Auto-Encoder(VAE) withAuto-matic Relevance Determination(ARD) to the problem of learning the parametersof an ODE which we callRodent. We show that it is possible to learn not onlyone specific model for a single process, but a manifold of models representingharmonic signals in general.

Robust sparse linear regression for tokamak plasma boundary estimation using variational Bayes

  • Autoři: Škvára, V., doc. Ing. Václav Šmídl, Ph.D., Urban, J.
  • Publikace: Journal of Physics: Conference Series. Bristol: IOP Publishing Ltd, 2018. p. 2-13. vol. 1047. ISSN 1742-6596.
  • Rok: 2018
  • DOI: 10.1088/1742-6596/1047/1/012015
  • Odkaz: https://doi.org/10.1088/1742-6596/1047/1/012015
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Precise control of the shape of plasma in a tokamak requires reliable reconstruction of the plasma boundary. The problem of boundary estimation can be reduced to a simple linear regression with a potentially infinite amount of regressors. This regression problem poses some difficulties for classical methods. The selection of regressors significantly influences the reconstructed boundary. Also, the underlying model may not be valid during certain phases of the plasma discharge. Formal model structure estimation technique based on the automatic relevance principle yields a version of sparse least squares estimator. In this contribution, we extend the previous method by relaxing the assumption of Gaussian noise and using Student's t-distribution instead. Such a model is less sensitive to potential outliers in the measurement. We show on simulations and real data that the proposed modification improves estimation of the plasma boundary in some stages of a plasma discharge. Performance of the resulting algorithm is evaluated with respect to a more detailed and computationally costly model which is considered to be the "ground truth" The results are also compared to those of Lasso and Tikhonov regularization techniques.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk