Publikace

Publikace našeho týmu

Block-Wise Construction of Tree-like Relational Features with Monotone Reducibility and Redundancy

An Experimental Test of Occam's Razor in Classification

  • DOI: 10.1007/s10994-010-5227-2
  • Odkaz: https://doi.org/10.1007/s10994-010-5227-2
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We empirically confirm the hypothesis that the risk of overfitting (poor generalization) follows only from the number of model tests rather than the complexity of the selected model.

Guest editors' introduction: Special issue on Inductive Logic Programming (ILP 2008)

  • DOI: 10.1007/s10994-009-5120-z
  • Odkaz: https://doi.org/10.1007/s10994-009-5120-z
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    The 18th International Conference on Inductive Logic Programming was held in Prague, September 10-12, 2008. Apart from four invited talks and a tutorial, the presented peerreviewed papers consisted of 20 submissions accepted as long papers for the main technical track, and 21 papers accepted for the work-in-progress track. The long papers can be found in Volume 5194 of Springer's Lecture Notes in Artificial Intelligence series. Based on the reviews of the papers as well as their presentations at the conference, the ILP-2008 PC chairs invited the authors of 7 selected papers to submit a revised and significantly extended version for this special issue. As a result of an additional peer-review adopting the journal criteria, 5 of these papers were finally accepted.

A Restarted Strategy for Efficient Subsumption Testing

  • Pracoviště: Katedra kybernetiky
  • Anotace:
    We design a randomized restarted subsumption testing algorithm ReSumEr2 and test it on generated graph data as well as on the predictive toxicology challenge (PTC) data set.

Learning Relational Descriptions of Differentially Expressed Gene Groups

  • Autoři: Trajkovski, I., prof. Ing. Filip Železný, Ph.D., Lavrač, N., Tolar, J.
  • Publikace: IEEE Transactions on Systems, Man, and Cybernetics: Part C. 2008, 38(1), 16-25. ISSN 1094-6977.
  • Rok: 2008

Sequential Data Mining: A Comparative Case Study in Development of Atherosclerosis Risk Factors

  • DOI: 10.1109/TSMCC.2007.906055
  • Odkaz: https://doi.org/10.1109/TSMCC.2007.906055
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Sequential data represent an important source of potentially new medical knowledge. However, this type of data is rarely provided in a format suitable for immediate application of conventional mining algorithms. This paper summarizes and compares three different sequential mining approaches, based respectively on windowing, episode rules and inductive logic programming. Windowing is one of the essential methods of data preprocessing, episode rules represent general sequential mining while inductive logic programming extracts first order features whose structure is determined by background knowledge. The three approaches are demonstrated and evaluated in terms of a case study STULONG. It is a longitudinal preventive study of atherosclerosis where the data consist of series of longterm observations recording the development of risk factors and associated conditions.

Constraint-based knowledge discovery from SAGE data

  • Autoři: doc. Ing. Jiří Kléma, Ph.D., Blachon, S., Soulet, A., Cremilleux, B., Gandrillon, O.
  • Publikace: In Silico Biology - An International Journal on Computational Molecular Biology. 2008, 8(2), 157-175. ISSN 1434-3207.
  • Rok: 2008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Current analyses of co-expressed genes are often based on global approaches such as clustering or bi-clustering. An alternative way is to employ local methods and search for patterns - sets of genes displaying specific expression properties in a set of situations. The main bottleneck of this type of analysis is twofold - computational costs and an overwhelming number of candidate patterns which can hardly be further exploited. A timely application of background knowledge available in literature databases, biological ontologies and other sources can help to focus on the most plausible patterns only. The paper proposes, implements and tests a flexible constraint-based framework that enables the effective mining and representation of meaningful over-expression patterns representing intrinsic associations among genes and biological situations.

Preventing Premature Convergence in a Simple EDA via Global Step Size Setting

  • Autoři: Ing. Petr Pošík, Ph.D.,
  • Publikace: Parallel Problem Solving from Nature - PPSN X. Heidelberg: Springer, 2008. p. 549-558. ISSN 0302-9743. ISBN 978-3-540-87699-1.
  • Rok: 2008
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    When a simple real valued estimation of distribution algorithm (EDA) with Gaussian model and maximum likelihood estimation of parameters is used, it converges prematurely even on the slope of the fitness function. The simplest way of preventing premature convergence by multiplying the variance estimate by a constant factor k each generation is studied. Recent works have shown that when increasing the dimensionality of the search space, such an algorithm becomes very quickly unable to traverse the slope and focus to the optimum at the same time. In this paper it is shown that when isotropic distributions with Gaussian or Cauchy distributed norms are used, the simple constant setting of $k$ is able to ensure a reasonable behaviour of the EDA on the slope and in the valley of the fitness function at the same time.

Estimation of Fitness Landscape Contours in EAs

  • Autoři: Ing. Petr Pošík, Ph.D., Franc, V.
  • Publikace: Proceedings of Genetic and Evolutionary Computation Conference 2007. New York: ACM, 2007. pp. 562-569. ISBN 978-1-59593-698-1.
  • Rok: 2007
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Evolutionary algorithms applied in real domain should profit from information about the local fitness function curvature. This paper presents an initial study of an evolutionary strategy with a novel approach for learning the covariance matrix of a Gaussian distribution. The learning method is based on estimation of the fitness landscape contour line between the selected and discarded individuals. The distribution learned this way is then used to generate new population members. The algorithm presented here is the first attempt to construct the Gaussian distribution this way and should be considered only a proof of concept; nevertheless, the empirical comparison on low-dimensional quadratic functions shows that our approach is viable and with respect to the number of evaluations needed to find a solution of certain quality, it is comparable to the state-of-the-art CMA-ES in case of sphere function and outperforms the CMA-ES in case of elliptical function.

Propositionalization based relational subgroup discovery with RSD

  • DOI: 10.1007/s10994-006-5834-0
  • Odkaz: https://doi.org/10.1007/s10994-006-5834-0
  • Pracoviště: Katedra kybernetiky
  • Anotace:
    Relational rule learning algorithms are typically designed to construct classification and prediction rules. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach was successfully applied to standard ILP problems (East-West trains, King-Rook-King chess endgame and mutagenicity prediction) and two real-life problems (analysis of telephone calls and traffic accident analysis).

Za stránku zodpovídá: RNDr. Patrik Mottl, Ph.D.