Lidé

Ing. Veronica Valeros

Všechny publikace

CTU Hornet 65 Niner: A Network Dataset of Geographically Distributed Low-interaction Honeypots

  • DOI: 10.1016/j.dib.2024.111261
  • Odkaz: https://doi.org/10.1016/j.dib.2024.111261
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    This data article introduces a new network dataset created to help understand how geographical location impacts the quality, type, and amount of incoming network attacks received by honeypots. The dataset consists of 12.4 million network flows collected from nine low-interaction honeypots in nine cities across the world for 65 days, from April 29th to July 1st, 2024. Each low-interaction honeypot was identically configured to capture incoming attacks using a state-of-the-art network flow collector, Zeek. Honeypots were distributed in nine cities: Amsterdam, Bangalore, Frankfurt, London, New York, San Francisco, Singapore, Toronto, and Sydney. The dataset is in JSON format and contains all types of Zeek network flow files, including protocol-specific logs.

DeepRed: A Deep Learning–Powered Command and Control Framework for Multi-Stage Red Teaming Against ML-based Network Intrusion Detection Systems

  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Emerging studies demonstrate that machine learning (ML) has the potential to improve the detection capabilities of network intrusion detection systems (NIDS) against evolving cyber threats. However, recent adversarial ML (AML) studies have revealed critical ML vulnerabilities. This paper presents innovative multistage red-teaming techniques to evaluate the robustness of ML-NIDS in real-world adversarial settings. Although extensive research has been conducted in this area, existing studies have critical shortcomings: (1) relying on unrealistic threat models, (2) focusing on traffic flow perturbation for evasion while neglecting that malicious activity occurs at the packet level, and (3) failing to preserve attack functionality after perturbation.

VelLMes: A High-Interaction AI-Based Deception Framework

  • DOI: 10.1109/EuroSPW67616.2025.00082
  • Odkaz: https://doi.org/10.1109/EuroSPW67616.2025.00082
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    There are very few SotA deception systems based on Large Language Models. The existing ones are limited only to simulating one type of service, mainly SSH shells. These systems - but also the deception technologies not based on LLMs - lack an extensive evaluation that includes human attackers. Generative AI has recently become a valuable asset for cybersecurity researchers and practitioners, and the field of cyber-deception is no exception. Researchers have demonstrated how LLMs can be leveraged to create realistic-looking honeytokens, fake users, and even simulated systems that can be used as honeypots. This paper presents an AI-based deception framework called VelLMes, which can simulate multiple protocols and services such as SSH Linux shell, MySQL, POP3, and HTTP. All of these can be deployed and used as honeypots, thus VelLMes offers a variety of choices for deception design based on the users' needs. VelLMes is designed to be attacked by humans, so interactivity and realism are key for its performance. We evaluate the generative capabilities and the deception capabilities. Generative capabilities were evaluated using unit tests for LLMs. The results of the unit tests show that, with careful prompting, LLMs can produce realistic-looking responses, with some LLMs having a 100% passing rate. In the case of the SSH Linux shell, we evaluated deception capabilities with 89 human attackers. The attackers interacted with a randomly assigned shell (either honeypot or real) and had to decide if it was a real Ubuntu system or a honeypot. The results showed that about 30% of the attackers thought that they were interacting with a real system when they were assigned an LLM-based honeypot. Lastly, we deployed 10 instances of the SSH Linux shell honeypot on the Internet to capture real-life attacks. Analysis of these attacks showed us that LLM honeypots simulating Linux shells can perform well against unstructured and unexpected attacks on the Internet, responding corr...

LLM in the Shell: Generative Honeypots

  • DOI: 10.1109/EuroSPW61312.2024.00054
  • Odkaz: https://doi.org/10.1109/EuroSPW61312.2024.00054
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Honeypots are essential tools in cybersecurity for early detection, threat intelligence gathering, and analysis of attacker's behavior. However, most of them lack the required realism to engage and fool human attackers long-term. Being easy to distinguish honeypots strongly hinders their effectiveness. This can happen because they are too deterministic, lack adaptability, or lack deepness. This work introduces shelLM, a dynamic and realistic software honeypot based on Large Language Models that generates Linux-like shell output. We designed and implemented shelLM using cloud-based LLMs. We evaluated if shelLM can generate output as expected from a real Linux shell. The evaluation was done by asking cybersecurity researchers to use the honeypot and give feedback if each answer from the honeypot was the expected one from a Linux shell. Results indicate that shelLM can create credible and dynamic answers capable of addressing the limitations of current honeypots. ShelLM reached a TNR of 0.90, convincing humans it was consistent with a real Linux shell. The source code and prompts for replicating the experiments have been publicly available.

Towards Better Understanding of Cybercrime: The Role of Fine-Tuned LLMs in Translation

  • Autoři: Ing. Veronica Valeros, Široková, A., Catania, C., Ing. Sebastián García, Ph.D.,
  • Publikace: Proceedings - 9th IEEE European Symposium on Security and Privacy Workshops, Euro S and PW 2024. Cannes: IEEE Computer Society, 2024. p. 91-99. ISSN 2768-0657. ISBN 979-8-3503-6729-4.
  • Rok: 2024
  • DOI: 10.1109/EuroSPW61312.2024.00017
  • Odkaz: https://doi.org/10.1109/EuroSPW61312.2024.00017
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Understanding cybercrime communications is paramount for cybersecurity defence. This often involves translating communications into English for processing, interpreting, and generating timely intelligence. The problem is that translation is hard. Human translation is slow, expensive, and scarce. Machine translation is inaccurate and biased. We propose using fine-tuned Large Language Models (LLM) to generate translations that can accurately capture the nuances of cybercrime language. We apply our technique to public chats from the NoName057(16) Russian-speaking hacktivist group. Our results show that our fine-tuned LLM model is better, faster, more accurate, and able to capture nuances of the language. Our method shows it is possible to achieve high-fidelity translations and significantly reduce costs by a factor ranging from 430 to 23,000 compared to a human translator.

Hornet 40: Network Dataset of Geographically Placed Honeypots

  • DOI: 10.1016/j.dib.2022.107795
  • Odkaz: https://doi.org/10.1016/j.dib.2022.107795
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    Deception technologies, and honeypots in particular, have been used for decades to understand how cyber attacks and attackers work. A myriad of factors impact the effectiveness of a honeypot. However, very few is known about the impact of the geographical location of honeypots on the amount and type of attacks. Hornet 40 is the first dataset designed to help understand how the geolocation of honeypots may impact the inflow of network attacks. The data consists of network flows in binary and text format, with up to 118 features, including 480 bytes of the content of each flow. They were created using the Argus flow collector. The passive honeypots are IP addresses connected to the Internet and do not have any honeypot software running, so attacks are not interactive. The data was collected from identically configured honeypot servers in eight locations: Amsterdam, Bangalore, Frankfurt, London, New York, San Francisco, Singapore, and Toronto. The dataset contains over 4.7 million network flows collected during forty days throughout April, May, and June 2021.

Growth and Commoditization of Remote Access Trojans

  • DOI: 10.1109/EuroSPW51379.2020.00067
  • Odkaz: https://doi.org/10.1109/EuroSPW51379.2020.00067
  • Pracoviště: Centrum umělé inteligence
  • Anotace:
    In the last three decades there have been significant changes in the cybercrime world in terms of organization, type of attacks, and tools. Remote Access Trojans (RAT) are an intrinsic part of traditional cybercriminal activities but they have become a standard tool in advanced espionage and scams attacks. The overly specialized research in our community on Remote Access Trojans has resulted in a seemingly lack of general perspective and understanding on how RATs have evolved as a phenomenon. This work presents a new generalist perspective on Remote Access Trojans, an analysis of their growth in the last 30 years, and a discussion on how they have become a commodity in the last decade. We found that the amount of RATs increased drastically in the last ten years and that nowadays they have become standardized commodity products that are no very different from each other.

Machete: Dissecting the Operations of a Cyber Espionage Group in Latin America

  • DOI: 10.1109/EuroSPW.2019.00058
  • Odkaz: https://doi.org/10.1109/EuroSPW.2019.00058
  • Pracoviště: Katedra počítačů, Centrum umělé inteligence
  • Anotace:
    Reports on cyber espionage operations have been on the rise in the last decade. However, operations in Latin America are heavily under researched and potentially underestimated. In this paper we analyze and dissect a cyber espionage tool known as Machete. Our research shows that Machete is operated by a highly coordinated and organized group who focuses on Latin American targets. We describe the five phases of the APT operations from delivery to exfiltration of information and we show why Machete is considered a cyber espionage tool. Furthermore, our analysis indicates that the targeted victims belong to military, political, or diplomatic sectors. The review of almost six years of Machete operations show that it is likely operated by a single group, and their activities are possibly state-sponsored. Machete is still active and operational to this day.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk