This is the hardest part. A naive reward (+1 per open port) leads to scanning loops. A sparse reward (+100 only for root) leads to no learning. Effective Autopentest-DRL uses :

The penetration testing steps the agent can take, such as scan_network , exploit_vulnerability , or privilege_escalation .

: Automated agents can test massive networks much faster than human teams, identifying "hidden" attack paths through sheer processing speed.

For developers and security researchers interested in exploring AI-driven security, the project is available on the crond-jaist GitHub repository . It is primarily intended for educational purposes, providing a hands-on way to study how AI can both threaten and protect digital infrastructure.

Once the DRL engine identifies a path, the framework uses Metasploit (via the pymetasploit3