doc. Ing. Tomáš Pevný, Ph.D.

Bias in AI: definitions, detection, and sample complexity

Branch of study: Computer Science – Department of Computer Science
Department: Department of Computer Science

Description:
There has been much recent interest in the bias of artificial-intelligence (AI) systems and, consequently, the regulation thereof. Bias often arises out of mental short cuts taken in model building and underrepresentation of sections of population in the data, rather than as a result of a wilful, nefarious operation. Many vendors of AI systems may be willing to test for the presence of bias, should there be a widely accepted definition and a method for estimating the bias. One of the significant conceptual challenges in regards to the target of mitigating bias in AI is the lack of a consensus as to what would be the best definition of measuring bias or fairness. For almost any problem, one can have multiple measures of individual fairness and multiple measures of subgroup fairness. Often, there are multiple protected attributes (such as race, sex, or ethnicity) defining a number of subgroups within the population. Thus, one obtains a multitude of combinations of a subgroup and a fairness measure, all of which should be considered to some extent. There have been recently proposed a number of definitions of bias and numerical methods for quantifying of thus defined bias in AI systems. It is clearly desirable to inform users as comprehensively as possible as to the fairness implications of their AI pipeline, rather than to consider a single definition of bias to the exclusion of others. Another challenge is the lack of understanding of the definitions of bias and the sample complexity of its detection. As it seems, one may need to study hundreds of thousands of “comparable” interactions with the AI system, in many cases. A thorough and comprehensive characterization of the existing definitions and the number of samples required to detect bias of various kinds up to a certain error with a certain probability, is clearly desirable. This can take the form of guarantees of probabilistically approximately correct (PAC) learning, or confidence intervals for the estimates produced otherwise. Finally, a major challenge lies in the lack of libraries for the detection of bias. The most widely used library, AI Fairness 360, covers only a few types of bias and does not allow for the estimation of confidence intervals. We would like to develop an open-source toolbox for the detection and quantification of multiple types of bias.

Big data and network security

Branch of study: Computer Science – Department of Computer Science
Department: Department of Computer Science

Description:
The many recent methods in fields of computer vision and voice recognition relies on simple models and massive amount of data, out of which only portion is labeled. The goal of this thesis is to explore, how these methods can be used in security domains, where most of available data are unlabeled. We assume that the methods cannot be used as they are, but they would need to be adjusted to suit particularities of the security domain.

Explainable and structured modeling of natural data

Branch of study: Computer Science – Department of Computer Science
Department: Department of Computer Science

Description:
This topic aims learning over data coming from real-world domains, by which it is understood that data have rich structures, such as relational graphs describing interactions of entities, they are organised in hierarchical structures, or they are collected from real physical experiments (like from tokamak reactors). While the learning over these domains is becoming more developed and we (the team of the advisor) have made a progress in these areas, we believe that the feedback provided back to the users is very limited. For example decision systems are not able to explain their decisions, to estimate the confidence of the decision, or to learn the underlying physical laws. The students should address some of these problems.

Reinforcement learning in computer security

Branch of study: Computer Science – Department of Computer Science
Department: Department of Computer Science

Description:
Computer security is increasingly more important topic due to the ubiquitous use of computers in our personal life and critical infrastructure. Hardly a day passes without newspapers reporting about new attack on enterprises, governments and infrastructure networks. New techniques to detect attacks as early as possible are of a great need. State of the art use of machine learning algorithms in computer security relies either on supervised classification or anomaly detection. The former has the advantage of being precise while the latter has the advantage of potentially detecting new threats. The goal of this work is to approach the security using reinforcement learning, as it better reflects the domain. Specifically, it has the advantage of discovering new threats while learning which anomalies are not interesting. Moreover, it supports continuous learning, which will remove the need to retrain the classifiers.

doc. Ing. Tomáš Pevný, Ph.D.

Dissertation topics

Bias in AI: definitions, detection, and sample complexity

Big data and network security

Explainable and structured modeling of natural data

Reinforcement learning in computer security

Stay informed