Lidé

Ing. Sebastián García, Ph.D.

Všechny publikace

A Survey of Privacy Attacks in Machine Learning

  • DOI: 10.1145/3624010
  • Odkaz: https://doi.org/10.1145/3624010
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    As machine learning becomes more widely used, the need to study its implications in security and privacy becomes more urgent. Although the body of work in privacy has been steadily growing over the past few years, research on the privacy aspects of machine learning has received less focus than the security aspects. Our contribution in this research is an analysis of more than 45 papers related to privacy attacks against machine learning that have been published during the past seven years. We propose an attack taxonomy, together with a threat model that allows the categorization of different attacks based on the adversarial knowledge, and the assets under attack. An initial exploration of the causes of privacy leaks is presented, as well as a detailed analysis of the different attacks. Finally, we present an overview of the most commonly proposed defenses and a discussion of the open problems and future directions identified during our analysis.

Bridging the Explanation Gap in AI Security: A Task-Driven Approach to XAI Methods Evaluation

  • Autoři: Ing. Ondřej Lukáš, Ing. Sebastián García, Ph.D.,
  • Publikace: Proceedings of the 16th International Conference on Agents and Artificial Intelligence. Setúbal: Science and Technology Publications, Lda, 2024. p. 1370-1377. vol. 3. ISSN 2184-433X. ISBN 978-989-758-680-4.
  • Rok: 2024
  • DOI: 10.5220/0012475200003636
  • Odkaz: https://doi.org/10.5220/0012475200003636
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Deciding which XAI technique is best depends not only on the domain, but also on the given task, the dataset used, the model being explained, and the target goal of that model. We argue that the evaluation of XAI methods has not been thoroughly analyzed in the network security domain, which presents a unique type of challenge. While there are XAI methods applied in network security there is still a large gap between the needs of security stakeholders and the selection of the optimal method. We propose to approach the problem by first defining the stack-holders in security and their prototypical tasks. Each task defines inputs and specific needs for explanations. Based on these explanation needs (e.g. understanding the performance, or stealing a model), we created five XAI evaluation techniques that are used to compare and select which XAI method is best for each task (dataset, model, and goal). Our proposed approach was evaluated by running experiments for different security stakehol ders, machine learning models, and XAI methods. Results were compared with the AutoXAI technique and random selection. Results show that our proposal to evaluate and select XAI methods for network security is well-grounded and that it can help AI security practitioners find better explanations for their given tasks.

Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments

  • DOI: 10.5220/0012391800003636
  • Odkaz: https://doi.org/10.5220/0012391800003636
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Large Language Models (LLMs) have gained widespread popularity across diverse domains involving text generation, summarization, and various natural language processing tasks. Despite their inherent limitations, LLM-based designs have shown promising capabilities in planning and navigating open-world scenarios. This paper introduces a novel application of pre-trained LLMs as agents within cybersecurity network environments, focusing on their utility for sequential decision-making processes. We present an approach wherein pre-trained LLMs are leveraged as attacking agents in two reinforcement learning environments. Our proposed agents demonstrate similar or better performance against state-of-the-art agents trained for thousands of episodes in most scenarios and configurations. In addition, the best LLM agents perform similarly to human testers of the environment without any additional training process. This design highlights the potential of LLMs to address complex decision-making tasks within cybersecurity efficiently. Furthermore, we introduce a new network security environment named NetSecGame. The environment is designed to support complex multi-agent scenarios within the network security domain eventually. The proposed environment mimics real network attacks and is designed to be highly modular and adaptable for various scenarios.

The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning

  • Autoři: Ing. Maria Rigaki, Ing. Sebastián García, Ph.D.,
  • Publikace: 28th European Symposium on Research in Computer Security, The Hague, The Netherlands, September 25–29, 2023, Proceedings, Part I. Basel: Springer Nature Switzerland AG, 2024. p. 44-64. ISSN 0302-9743. ISBN 978-3-031-50593-5.
  • Rok: 2024
  • DOI: 10.1007/978-3-031-51482-1_3
  • Odkaz: https://doi.org/10.1007/978-3-031-51482-1_3
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection toolchain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evasion of antivirus systems, and defenders try to gain insight into their methods. This work proposes a new algorithm that combines Malware Evasion and Model Extraction (MEME) attacks. MEME uses model-based reinforcement learning to adversarially modify Windows executable binary samples while simultaneously training a surrogate model with a high agreement with the target model to evade. To evaluate this method, we compare it with two state-of-the-art attacks in adversarial malware creation, using three well-known published models and one antivirus product as targets. Results show that MEME outperforms the state-of-the-art methods in terms of evasion capabilities in almost all cases, producing evasive malware with an evasion rate in the range of 32–73%. It also produces surrogate models with a prediction label agreement with the respective target models between 97–99%. The surrogate could be used to fine-tune and improve the evasion rate in the future.

Hornet 40: Network Dataset of Geographically Placed Honeypots

  • DOI: 10.1016/j.dib.2022.107795
  • Odkaz: https://doi.org/10.1016/j.dib.2022.107795
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Deception technologies, and honeypots in particular, have been used for decades to understand how cyber attacks and attackers work. A myriad of factors impact the effectiveness of a honeypot. However, very few is known about the impact of the geographical location of honeypots on the amount and type of attacks. Hornet 40 is the first dataset designed to help understand how the geolocation of honeypots may impact the inflow of network attacks. The data consists of network flows in binary and text format, with up to 118 features, including 480 bytes of the content of each flow. They were created using the Argus flow collector. The passive honeypots are IP addresses connected to the Internet and do not have any honeypot software running, so attacks are not interactive. The data was collected from identically configured honeypot servers in eight locations: Amsterdam, Bangalore, Frankfurt, London, New York, San Francisco, Singapore, and Toronto. The dataset contains over 4.7 million network flows collected during forty days throughout April, May, and June 2021.

Large Scale Analysis of DoH Deployment on the Internet

  • Autoři: Ing. Sebastián García, Ph.D., Bogado Garcia, J., Hynek, K., Vekshin, D., Čejka, T., Wasicek, A.
  • Publikace: Computer Security - ESORICS 2022. Cham: Springer International Publishing, 2022. p. 145-165. ISSN 0302-9743. ISBN 978-3-031-17142-0.
  • Rok: 2022
  • DOI: 10.1007/978-3-031-17143-7_8
  • Odkaz: https://doi.org/10.1007/978-3-031-17143-7_8
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    DNS over HTTPS (DoH) is one of the standards to protect the security and privacy of users. The choice of DoH provider has controversial consequences, from monopolisation of surveillance to lost visibility by network administrators and security providers. More importantly, it is a novel security business. Software products and organisations depend on users choosing well-known and trusted DoH resolvers. However, there is no comprehensive study on the number of DoH resolvers on the Internet, its growth, and the trustworthiness of the organisations behind them. This paper studies the deployment of DoH resolvers by (i) scanning the whole Internet for DoH resolvers in 2021 and 2022; (ii) creating lists of well-known DoH resolvers by the community; (iii) characterising what those resolvers are, (iv) comparing the growth and differences. Results show that (i) the number of DoH resolvers increased 4.8 times in the period 2021-2022, (ii) the number of organisations providing DoH services has doubled, and (iii) the number of DoH resolvers in 2022 is 28 times larger than the number of well-known DoH resolvers by the community. Moreover, 94% of the public DoH resolvers on the Internet are unknown to the community, 77% use certificates from free services, and 57% belong to unknown organisations or personal servers. We conclude that the number of DoH resolvers is growing at a fast rate; also that at least 30% of them are not completely trustworthy and users should be very careful when choosing a DoH resolver.

Cybercrime Specialization: An Exposé of a Malicious Android Obfuscation-as-a-Service

  • Autoři: Šembera, V., Paquet-Clouston, M., Ing. Sebastián García, Ph.D., Erquiaga, M.
  • Publikace: 2021 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). Brighton: Institute of Electrical and Electronics Engineers, 2021. p. 213-226. ISBN 978-1-6654-1012-0.
  • Rok: 2021
  • DOI: 10.1109/EuroSPW54576.2021.00029
  • Odkaz: https://doi.org/10.1109/EuroSPW54576.2021.00029
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Malware authors constantly obfuscate their files and defenders regularly develop new techniques to detect them. Given this cat-and-mouse game, specialized obfuscation services have appeared in the cybercrime industry. These services allow malware authors to obfuscate their code for a fee. This study investigates an automated obfuscation-as-a-service platform for Android applications and yields unique insights on the technical difficulties and business reality of those behind such a specialized service. The service investigated was found to be average in quality, mainly using known obfuscation techniques, and generating obfuscated applications that were still detected by anti-viruses. It had a small clientele of large-scale attackers who used the service to decrease anti-virus detections of highly malicious applications, thus increasing their chances of compromising devices. Depending on the price bundles considered, operators offering the service were estimated to have made a minimum revenue ranging from USD 5,100 (conservative) to USD 61,160 (optimistic) for a six-month operation. This study illustrates that even though obfuscation-as-a-service is a market niche, taking advantage of the value added from this specialization is not effortless nor easily accessible to everyone involved in cybercrime.

Deep generative models to extend active directory graphs with honeypot users

  • Autoři: Ing. Ondřej Lukáš, Ing. Sebastián García, Ph.D.,
  • Publikace: Proceedings of the 2nd International Conference on Deep Learning Theory and Applications - DeLTA. Porto: SciTePress - Science and Technology Publications, 2021. p. 140-147. vol. 1. ISSN 2184-9277. ISBN 978-989-758-526-5.
  • Rok: 2021
  • DOI: 10.5220/0010556601400147
  • Odkaz: https://doi.org/10.5220/0010556601400147
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Active Directory (AD) is a crucial element of large organizations, given its central role in managing access to resources. Since AD is used by all users in the organization, it is hard to detect attackers. We propose to generate and place fake users (honeyusers) in AD structures to help detect attacks. However, not any honeyuser will attract attackers. Our method generates honeyusers with a Variational Autoencoder that enriches the AD structure with well-positioned honeyusers. It first learns the embeddings of the original nodes and edges in the AD, then it uses a modified Bidirectional DAG-RNN to encode the parameters of the probability distribution of the latent space of node representations. Finally, it samples nodes from this distribution and uses an MLP to decide where the nodes are connected. The model was evaluated by the similarity of the generated AD with the original, by the positions of the new nodes, by the similarity with GraphRNN and finally by making real intruders attack the generated AD structure to see if they select the honeyusers. Results show that our machine learning model is good enough to generate well-placed honeyusers for existing AD structures so that intruders are lured into them.

Growth and Commoditization of Remote Access Trojans

  • DOI: 10.1109/EuroSPW51379.2020.00067
  • Odkaz: https://doi.org/10.1109/EuroSPW51379.2020.00067
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    In the last three decades there have been significant changes in the cybercrime world in terms of organization, type of attacks, and tools. Remote Access Trojans (RAT) are an intrinsic part of traditional cybercriminal activities but they have become a standard tool in advanced espionage and scams attacks. The overly specialized research in our community on Remote Access Trojans has resulted in a seemingly lack of general perspective and understanding on how RATs have evolved as a phenomenon. This work presents a new generalist perspective on Remote Access Trojans, an analysis of their growth in the last 30 years, and a discussion on how they have become a commodity in the last decade. We found that the amount of RATs increased drastically in the last ten years and that nowadays they have become standardized commodity products that are no very different from each other.

A Better Infected Hosts Detection Combining Ensemble Learning and Threat Intelligence

  • Autoři: Venosa, P., Ing. Sebastián García, Ph.D., Javier Diaz, F.
  • Publikace: CACIC 2019: 25th Argentine Congress of Computer Science. Cham: Springer, 2019. p. 354-365. ISSN 1865-0929. ISBN 978-3-030-48324-1.
  • Rok: 2019
  • DOI: 10.1007/978-3-030-48325-8_23
  • Odkaz: https://doi.org/10.1007/978-3-030-48325-8_23
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Ensemble learning techniques have been successfully proposed and used to improve threats detection in cybersecurity. These techniques usually improve the detection results by combining algorithms that together have less errors. However there has not been any ensemble learning algorithm used to classify network flows when several methods are used to give individual detections for each of the flows. The state of the art in the use of ensemble learning techniques was analyzed to find an alternative for the current intrusion detection mechanisms. This research proposes to incorporate ensemble learning to the Stratosphere Linux IPS (SLIPS), a behavioral-based intrusion detection and prevention system that uses machine learning algorithms to detect malicious behaviors. Our ensembling method is used to obtain better results, taking advantage of the benefits of SLIPS' classifiers and modules. A contribution of our method is to extend the ensembling techniques by considering Threat Intelligence blacklists feeds as part of the detections. We present the results of the first stage of this project, i.e. ensemble learning algorithms to classify individual flows when they have multiple labels. on the other hand we also present the results corresponding to the second stage of our project, i.e. the detection of groups of flows going to the same destination IP.

Deep Convolutional Neural Networks for DGA Detection

  • Autoři: Catania, C., Ing. Sebastián García, Ph.D., Torres, P.
  • Publikace: Computer Science - CACIS 2018. Düsseldorf: Springer VDI Verlag, 2019. p. 327-340. ISSN 1865-0929. ISBN 978-3-030-20786-1.
  • Rok: 2019
  • DOI: 10.1007/978-3-030-20787-8_23
  • Odkaz: https://doi.org/10.1007/978-3-030-20787-8_23
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    A Domain Generation Algorithm (DGA) is an algorithm to generate domain names in a deterministic but seemly random way. Malware use DGAs to generate the next domain to access the Command & Control (C&C) communication server. Given the simplicity of the generation process and speed at which the domains are generated, a fast and accurate detection method is required. Convolutional neural network (CNN) are well known for performing real-time detection in fields like image and video recognition. Therefore, they seemed suitable for DGA detection. The present work provides an analysis and comparison of the detection performance of a CNN for DGA detection. A CNN with a minimal architecture complexity was evaluated on a dataset with 51 DGA malware families and normal domains. Despite its simple architecture, the resulting CNN model correctly detected more than 97% of total DGA domains with a false positive rate close to 0.7%. © 2019, Springer Nature Switzerland AG.

Detecting DNS Threats: A Deep Learning Model to Rule Them All

  • Autoři: Palau, F., Catania, C., Ing. Sebastián García, Ph.D., Luis Guerra, J.
  • Publikace: ASAI. ARGENTINE SYMPOSIUM ON ARTIFICIAL INTELLIGENCE, 2019. p. 90-101. 2019. ISSN 2451-7585.
  • Rok: 2019
  • DOI: 10.13140/RG.2.2.14296.03849
  • Odkaz: https://doi.org/10.13140/RG.2.2.14296.03849
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Domain Name Service is a central part of Internet regular operation. Such importance has made it a common target of different malicious behaviors such as the application of Domain Generation Algorithms (DGA) for command and control a group of infected computers or Tunneling techniques for bypassing system administrator restrictions. A common detection approach is based on Training different models detecting DGA and Tunneling capable of performing a lexicographic discrimination of the domain names. However, since both DGA and Tunneling showed domain names with observable lexicographical differences with normal domains, it was reasonable to apply the same detection approach to both threats. In the present work, we propose a multi class convolutional network architecture (MC-CNN) capable of detecting both DNS threats. The resulting MC-CNN is able to detect correctly 99% of normal domains ,97% of DGAs and 92% of Tunneling, with a False Positive Rate of 2.8%, 0.7% and 0.0015% respectively.

Geost Botnet. Operational Security Failures of a New Android Banking Threat

  • Autoři: Ing. Sebastián García, Ph.D., Erquiaga, M.J., Shirokova, A., Garino, C.G.
  • Publikace: Proceedings of 4th IEEE European Symposium on Security and Privacy. IEEE Xplore, 2019. p. 406-409. ISBN 978-1-7281-3026-2.
  • Rok: 2019
  • DOI: 10.1109/EuroSPW.2019.00051
  • Odkaz: https://doi.org/10.1109/EuroSPW.2019.00051
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Effective operational security is difficult to maintain due to an increase in the costs of work and a decrease in the performance of actions. This is true both for security analysts and malicious attackers. It is tedious, and errors are easy to make. This paper describes the rare discovery of a new Android banking botnet, named Geost, from the operational security failures of its botmaster. They made many mistakes, including using the illegal proxy network of the HtBot malware, not encrypting their Command and Control servers, re-using security services, trusting other attackers with less operational security, and not encrypting chat sessions. The Geost botnet has hundreds of malicious domains, thirteen IP addresses for C&C servers, approximately 800,000 victims in Russia, and potential access to several million Euros in the bank accounts of the victims. More importantly, the operational security mistakes lead to the discovery of members of an underground group that develop and maintain the C&C of Geost. It is seldom possible to glimpse into the decisions taken by the attackers due to failures in their operational security. This research presents the finding of a new Android banking botnet from operational security mistakes, creates an overview of the botnet operation, analyses the victims, and study the relationships with the discovered groups of developers.

Machete: Dissecting the Operations of a Cyber Espionage Group in Latin America

  • DOI: 10.1109/EuroSPW.2019.00058
  • Odkaz: https://doi.org/10.1109/EuroSPW.2019.00058
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Reports on cyber espionage operations have been on the rise in the last decade. However, operations in Latin America are heavily under researched and potentially underestimated. In this paper we analyze and dissect a cyber espionage tool known as Machete. Our research shows that Machete is operated by a highly coordinated and organized group who focuses on Latin American targets. We describe the five phases of the APT operations from delivery to exfiltration of information and we show why Machete is considered a cyber espionage tool. Furthermore, our analysis indicates that the targeted victims belong to military, political, or diplomatic sectors. The review of almost six years of Machete operations show that it is likely operated by a single group, and their activities are possibly state-sponsored. Machete is still active and operational to this day.

An Analysis of Convolutional Neural Networks for detecting DGA

  • Autoři: Catania, C., Ing. Sebastián García, Ph.D., Torres, P.
  • Publikace: Computer Science - CACIC 2018. Cham: Springer, 2018. p. 1060-1069. ISBN 978-950-658-472-6.
  • Rok: 2018
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    A Domain Generation Algorithm (DGA) is an algorithm togenerate domain names in a deterministic but seemly random way. Mal-ware use DGAs to generate the next domain to access the CommandControl (C&C) communication channel. Given the simplicity and veloc-ity associated to the domain generation process, machine learning detec-tion methods emerged as suitable detection solution. However, since theperiodical retraining becomes mandatory, a fast and accurate detectionmethod is needed. Convolutional neural network (CNN) are well knownfor performing real-time detection in fields like image and video recogni-tion. Therefore, they seem suitable for DGA detection. The present workis a preliminary analysis of the detection performance of CNN for DGAdetection. A CNN with a minimal architecture complexity was evaluatedon a dataset with 51 DGA malware families as well as normal domains.Despite its simple architecture, the resulting CNN model correctly de-tected more than 97% of total DGA domains with a false positive rateclose to 0.7%.

Analysis of Botnet Behavior as a Distributed System

  • Autoři: Erquiaga, M.J., Ing. Sebastián García, Ph.D., Garino, C.G.
  • Publikace: Proceedings of the IV School on Systems and Networks. Aachen: CEUR Workshop Proceedings, 2018. p. 83-85. ISSN 1613-0073.
  • Rok: 2018
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    El crecimiento vertiginoso de nuevas tec-nolog ́ıas, trae aparejado el crecimiento de apli-caciones maliciosas. Estas aplicaciones hacenuso de los recursos de los dispositivos infec-tados para realizar actividades il ́ıcitas, enviarmails de forma masiva (spam) o minar paraobtener criptomonedas. Para minar, se re-quiere grandes capacidades de c ́omputo Lasbotnetspueden ser consideradas como un tipode de aplicaci ́on de computaci ́on distribu ́ıda.La palabra botnet significa red de robots. Esun tipo de malware, instalado en una com-putadora que ha sido infectada, con la habili-dad de auto propagarse hacia otras m ́aquinas.Todas las computadoras infectadas conformanla “red de bots”, o botnet. Este tipo de mal-ware utiliza los recursos de la computadora in-fectad (CPU, RAM, ancho de banda), para co-municarse con su Botnet Master, que es quienle da ́ordenes. El presente trabajo es un es-tado del arte, se analiza el comportamiento delas botnets como aplicaciones de computaci ́ondistribu ́ıda. Se considera el comportamientoa nivel de consumo de recursos de los dispos-itivos que son infectados por el malware, enparticular los miners.

Bringing a GAN to a Knife-Fight: Adapting Malware Communication to Avoid Detection

  • DOI: 10.1109/SPW.2018.00019
  • Odkaz: https://doi.org/10.1109/SPW.2018.00019
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Generative Adversarial Networks (GANs) have been successfully used in a large number of domains. This paper proposes the use of GANs for generating network traffic in order to mimic other types of traffic. In particular, our method modifies the network behavior of a real malware in order to mimic the traffic of a legitimate application, and therefore avoid detection. By modifying the source code of a malware to receive parameters from a GAN, it was possible to adapt the behavior of its Command and Control (C2) channel to mimic the behavior of Facebook chat network traffic. In this way, it was possible to avoid the detection of new-generation Intrusion Prevention Systems that use machine learning and behavioral characteristics. A real-life scenario was successfully implemented using the Stratosphere behavioral IPS in a router, while the malware and the GAN were deployed in the local network of our laboratory, and the C2 server was deployed in the cloud. Results show that a GAN can successfully modify the traffic of a malware to make it undetectable. The modified malware also tested if it was being blocked and used this information as a feedback to the GAN. This work envisions the possibility of self-adapting malware and self-adapting IPS.

Observer Effect: How Intercepting HTTPS Traffic Forces Malware to Change Their Behavior

  • Autoři: Erquiaga, M.J., Ing. Sebastián García, Ph.D., Garino, C.G.
  • Publikace: Computer Science - CACIC 2017. Cham: Springer International Publishing, 2018. p. 272-281. ISSN 1865-0929. ISBN 978-3-319-75213-6.
  • Rok: 2018
  • DOI: 10.1007/978-3-319-75214-3_26
  • Odkaz: https://doi.org/10.1007/978-3-319-75214-3_26
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    During the last couple of years there has been an important surge on the use of HTTPs by malware. The reason for this increase is not completely understood yet, but it is hypothesized that it was forced by organizations only allowing web traffic to the Internet. Using HTTPs makes malware behavior similar to normal connections. Therefore, there has been a growing interest in understanding the usage of HTTPs by malware. This paper describes our research to obtain large quantities of real malware traffic using HTTPs, our use of man-in-the-middle HTTPs interceptor proxies to open and study the content, and our analysis of how the behavior of the malware changes after being intercepted. The research goal is to understand how malware uses HTTPs and the impact of intercepting its traffic. We conclude that the use of an interceptor proxy forces the malware to change its behavior and therefore should be carefully considered before being implemented.

Reliable Machine Learning for Networking: Key Issues and Approaches

  • Autoři: Hammerschmidt, C.A., Ing. Sebastián García, Ph.D., Verwer, S., State, R.
  • Publikace: Proceedings of the 42nd IEEE Conference on Local Computer Networks. USA: IEEE Computer Society, 2017. p. 167-170. ISSN 0742-1303. ISBN 978-1-5090-6523-3.
  • Rok: 2017
  • DOI: 10.1109/LCN.2017.74
  • Odkaz: https://doi.org/10.1109/LCN.2017.74
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Machine learning has become one of the go-to methods for solving problems in the field of networking. This development is driven by data availability in large-scale networks and the commodification of machine learning frameworks. While this makes it easier for researchers to implement and deploy machine learning solutions on networks quickly, there are a number of vital factors to account for when using machine learning as an approach to a problem in networking and translate testing performance to real networks deployments successfully. This paper, rather than presenting a particular technical result, discusses the necessary considerations to obtain good results when using machine learning to analyze network-related data.

An analysis of Recurrent Neural Networks for Botnet detection behavior

  • Autoři: Pablo Torres, PT, Carlos Catania, CC, Ing. Sebastián García, Ph.D., Carlos Garcia Garino, CGG
  • Publikace: 2016 IEEE Biennial Congress of Argentina. San Francisco: American Institute of Physics and Magnetic Society of the IEEE, 2016. ISBN 978-1-4673-9764-3.
  • Rok: 2016
  • DOI: 10.1109/ARGENCON.2016.7585247
  • Odkaz: https://doi.org/10.1109/ARGENCON.2016.7585247
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    A Botnet can be conceived as a group of compromised computers which can be controlled remotely to execute coordinated attacks or commit fraudulent acts. The fact that Botnets keep continuously evolving means that traditional detection approaches are always one step behind. Recently, the behavior analysis of network traffic has arisen as a way to tackle the Botnet detection problem. The behavioral analysis approach aims to look at the common patterns that Botnets follow across their life cycle, trying to generalize in order to become capable of detecting unseen Botnet traffic. This work provides an analysis of the viability of Recurrent Neural Networks (RNN) to detect the behavior of network traffic by modeling it as a sequence of states that change over time. The recent success applying RNN to sequential data problems makes them a viable candidate on the task of sequence behavior analysis. The performance of a RNN is evaluated considering two main issues, the imbalance of network traffic and the optimal length of sequences. Both issues have a great impact in potentially real-life implementation. Evaluation is performed using a stratified k-fold cross validation and an independent test is conducted on not previously seen traffic belonging to a different Botnet. Preliminary results reveal that the RNN is capable of classifying the traffic with a high attack detection rate and an very small false alarm rate, which makes it a potential candidate for implementation and deployment on real-world scenarios. However, experiments exposed the fact that RNN detection models have problems for dealing with traffic behaviors not easily differentiable as well as some particular cases of imbalanced network traffic.

Detecting DGA Malware traffic through Behavioral Models

  • Autoři: Ing. Sebastián García, Ph.D., Maria Jose Erquiaga, MJE, Carlos Catania, CC
  • Publikace: 2016 IEEE Biennial Congress of Argentina. San Francisco: American Institute of Physics and Magnetic Society of the IEEE, 2016. ISBN 978-1-4673-9764-3.
  • Rok: 2016
  • DOI: 10.1109/ARGENCON.2016.7585238
  • Odkaz: https://doi.org/10.1109/ARGENCON.2016.7585238
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Abstract: Some botnets use special algorithms to generate the domain names they need to connect to their command and control servers. They are refereed as Domain Generation Algorithms. Domain Generation Algorithms generate domain names and tries to resolve their IP addresses. If the domain has an IP address, it is used to connect to that command and control server. Otherwise, the DGA generates a new domain and keeps trying to connect. In both cases it is possible to capture and analyze the special behavior shown by those DNS packets in the network. The behavior of Domain Generation Algorithms is difficult to automatically detect because each domain is usually randomly generated and therefore unpredictable. Hence, it is challenging to separate the DNS traffic generated by malware from the DNS traffic generated by normal computers. In this work we analyze the use of behavioral detection approaches based on Markov Models to differentiate Domain Generation Algorithms traffic from normal DNS traffic. The evaluation methodology of our detection models has focused on a real-time approach based on the use of time windows for reporting the alerts. All the detection models have shown a clear differentiation between normal and malicious DNS traffic and most have also shown a good detection rate. We believe this work is a further step in using behavioral models for network detection and we hope to facilitate the development of more general and better behavioral detection methods of malware traffic.

Detecting the Behavioral Relationships of Malware Connections

  • DOI: 10.1145/2970030.2970038
  • Odkaz: https://doi.org/10.1145/2970030.2970038
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    A normal computer infected with malware is difficult to detect. There have been several approaches in the last years which analyze the behavior of malware and obtain good results. The malware traffic may be detected, but it is very common to miss-detect normal traffic as malicious and generate false positives. This is specially the case when the methods are tested in real and large networks. The detection errors are generated due to the malware changing and rapidly adapting its domains and patterns to mimic normal connections. To better detect malware infections and separate them from normal traffic we propose to detect the behavior of the group of connections generated by the malware. It is known that malware usually generates various related connections simultaneously and therefore it shows a group pattern. Based on previous experiments, this paper suggests that the behavior of a group of connections can be modelled as a directed cyclic graph with special properties, such as its internal patterns, relationships, frequencies and sequences of connections. By training the group models on known traffic it may be possible to better distinguish between a malware connection and a normal connection. © 2016 Copyright held by the owner/author(s).

Modelling the Network behaviour of Malware to Block Malicious Patterns. The Stratosphere Project: a Behavioural IPS

  • Autoři: Ing. Sebastián García, Ph.D.,
  • Publikace: Proceedings of Virus Bulletin Conference 2015. Abingdon: Virus Bulletin Ltd, 2015, Available from: https://www.virusbtn.com/conference/vb2015/abstracts/Garcia.xml
  • Rok: 2015
  • Pracoviště: Katedra počítačů
  • Anotace:
    Current malware traffic detection solutions work mostly by using static fingerprints, whitelists and blacklists, and crowd-sourced threat intelligence analytics. These methods are useful for detecting known malware in real time, but are insufficient to detect unknown malicious trends and attacks. Our proposed complementary solution is to analyse the inherent patterns of malware actions in the network by means of machine learning algorithms. In particular, we use Markov Chains-based algorithms to find network patterns that are independent of static features, such as IP addresses or payloads. These patterns are used to build behavioural models of malware actions that are later used to detect similar traffic in the network. All these models and detection algorithms were used to create a free software intrusion prevention system, called Stratosphere IPS, which is thoroughly tested with normal and malicious traffic. The IPS is able to detect new network patterns that are similar to the known malicious behaviours. The Stratosphere IPS tool will be used to show how behavioural models can detect real malware traffic.

An empirical comparison of botnet detection methods

  • Autoři: Ing. Sebastián García, Ph.D., Grill, M., Stiborek, J., Zunino, A.
  • Publikace: Computers & Security. 2014, 45 100-123. ISSN 0167-4048.
  • Rok: 2014
  • DOI: 10.1016/j.cose.2014.05.011
  • Odkaz: https://doi.org/10.1016/j.cose.2014.05.011
  • Pracoviště: Katedra počítačů
  • Anotace:
    The results of botnet detection methods are usually presented without any comparison. Although it is generally accepted that more comparisons with third-party methods may help to improve the area, few papers could do it. Among the factors that prevent a comparison are the difficulties to share a dataset, the lack of a good dataset, the absence of a proper description of the methods and the lack of a comparison methodology. This paper compares the output of three different botnet detection methods by executing them over a new, real, labeled and large botnet dataset. This dataset includes botnet, normal and background traffic. The results of our two methods (BClus and CAMNEP) and BotHunter were compared using a methodology and a novel error metric designed for botnet detections methods. We conclude that comparing methods indeed helps to better estimate how good the methods are, to improve the algorithms, to build better datasets and to build a comparison methodology.

Detecting Botnet Traffic from a Single Host

  • Autoři: Ing. Sebastián García, Ph.D., Zunino, A., Campo, M.
  • Publikace: Handbook of Research on Emerging Developments in Data Privacy. Hershey, Pennsylvania: IGI Global, 2014. p. 426-446. ISBN 978-1-4666-7381-6.
  • Rok: 2014
  • DOI: 10.4018/978-1-4666-7381-6.ch019
  • Odkaz: https://doi.org/10.4018/978-1-4666-7381-6.ch019
  • Pracoviště: Katedra počítačů
  • Anotace:
    The detection of bots and botnets in the network may be improved if the analysis is done on the traffic of one bot alone. While a botnet may be detected by correlating the behavior of several bots in a large amount of traffic, one bot alone can be detected by analyzing its unique trends in less traffic. The algorithms to differentiate the traffic of one bot from the normal traffic of one computer may take advantage of these differences. The authors propose to detect bots in the network by analyzing the relationships between flow features in a time window. The technique is based on the Expectation-Maximization clustering algorithm. To verify the method they designed test-beds and obtained a dataset of six different captures. The results are encouraging, showing a true positive error rate of 99.08% with a false positive error rate of 0.7%.

Identifying and Modeling Botnet C&C Behaviors

  • Autoři: Ing. Sebastián García, Ph.D., Uhlíř, V., Rehák, M.
  • Publikace: Proceedings of the 1st International Workshop on Agents and CyberSecurity. New York: ACM, 2014. ISBN 978-1-4503-2728-2.
  • Rok: 2014
  • DOI: 10.1145/2602945.2602949
  • Odkaz: https://doi.org/10.1145/2602945.2602949
  • Pracoviště: Katedra počítačů
  • Anotace:
    Through the analysis of a long-term botnet capture, we identified and modeled the behaviors of its C&C channels. They were found and characterized by periodicity analyses and statistical representations. The relationships found between the behaviors of the UDP, TCP and HTTP C&C channels allowed us to unify them in a general model of the botnet behavior. Our behavioral analysis of the C&C channels gives a new perspective on the modeling of malware behavior, helping to better understand botnets.

Botnet Behavior Detection using Network Synchronism

  • Autoři: Ing. Sebastián García, Ph.D., Zunino, A., Campo, M.
  • Publikace: Privacy, Intrusion Detection and Response: Technologies for Protecting Networks. Hershey, Pennsylvania: IGI Global, 2011. p. 1-23. ISBN 9781609608361.
  • Rok: 2011
  • DOI: 10.4018/978-1-60960-836-1.ch005
  • Odkaz: https://doi.org/10.4018/978-1-60960-836-1.ch005
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Botnets’ diversity and dynamism challenge detection and classification algorithms depend heavily on static or protocol-dependant features. Several methods showing promising results were proposed using behavioral-based approaches. The authors conducted an analysis of botnets’ and bots’ most inherent characteristics such as synchronism and network load within specific time windows to detect them more efficiently. By not relying on any specific protocol, our proposed approach detects infected computers by clustering bots’ network behavioral characteristics using the Expectation-Maximization algorithm. An encouraging false positive error rate of 0.7% shows that bots’ traffic can be accurately separated by our approach by analyzing several bots and non-botnet network captures and applying a detailed analysis of error rates.

Revisiting clustering methods to their application on keystroke dynamics for intruder classification

  • Autoři: Zamonsky Pedernera, G., Sznur, S., Sorondo Ovando, G., Ing. Sebastián García, Ph.D., Meschino, G.
  • Publikace: IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS). Sydney: IEEE, 2010. ISBN 978-1-4244-6302-2.
  • Rok: 2010
  • DOI: 10.1109/BIOMS.2010.5610443
  • Odkaz: https://doi.org/10.1109/BIOMS.2010.5610443
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Keystroke dynamics is a set of computer techniques that has been used successfully for many years for authentication mechanisms and masqueraders detection. Classification algorithms have reportedly performed well, but there is room for improvement. As obtaining real intruders keystrokes is a very difficult task, it has been a common practice to use normal users to capture keystroke data in previous work. Our research presents a novel approach to intruder classification using real intrusion datasets and focusing on intruders behavior. We compute six distance measures between sessions to cluster them using both modified K-means and Subtractive Clustering algorithms. Our distance measures use features that came from the relation between intruders sessions, instead of using features from each user only. The performance evaluation of our experiments showed that results are promising and intruders can be successfully classified with acceptable error rates.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk