Six24 Talk on the Application of RL for Cybersecurity

Six24 Cyber Labs will present its latest research on applying reinforcement learning (RL) to Advanced Persistent Threat (APT) attack path generation at the Applied Machine Learning for Cyber Security (AMLUCS) conference, held 9–10 September 2025 in Bristol, UK. AMLUCS is a two-day event that brings together cybersecurity and AI/ML practitioners to share advancements at the intersection of these fields. The talk is based on work performed under the DARPA CASTLE program, focusing on the challenges and solutions involved in translating offensive cybersecurity problems into machine learning contexts.

At the core of six24’s approach is a proprietary Hierarchical Markov Model (HMM), which ties agent behavior to human behavior through an optimized process workflow. This behavioral state space model reflects six24’s operational cybersecurity expertise and serves as a foundation for filtering the agent’s action space and structuring its partial observations of the environment. The HMM is central to how SIEGE, six24’s AI agent, generates dynamic and realistic attack paths.

SIEGE also includes a powerful simulation engine for modeling customer networks. It ingests data from a variety of sources—including Nmap, OpenVAS, Nessus, and Nuclei—as well as its own host-based surveys from production environments. This data is enriched with vulnerability intelligence, creating a detailed representation of the network’s attack surface. The simulator is then used to run APT-style scenarios that assess the potential impact on business operations.

SIEGE generated dynamic network used for training

To support robust training, SIEGE can generate diverse enterprise network topologies on demand. The agent is trained against these simulated environments in the same way a red team operator might rehearse in a lab. The result is an RL-driven AI agent that operates like a real attacker—using tools, executing commands, and producing detailed activity logs that can be used to evaluate and improve a defender’s security stack.