Featured image of post Shannon Explained: The Autonomous AI Pentester That Breaks Your App Before Hackers Do

Shannon Explained: The Autonomous AI Pentester That Breaks Your App Before Hackers Do

A deep dive into Shannon, the fully autonomous AI penetration tester by Keygraph that executes real exploits—not just alerts—using a sophisticated multi-agent architecture.

Welcome to another GitHub deep dive! Today we’re exploring Shannon, an open-source, fully autonomous AI pentester built by the team at Keygraph.

Shannon’s mission is blunt: break your web app before someone else does. It doesn’t generate a list of potential warnings — it executes real-world exploits and only reports what it can actually prove is vulnerable.

With a 96.15% success rate on a hint-free XBOW benchmark out of the box, Shannon is setting a new bar for what automated security tooling can achieve.

Let’s explore it using our Mental Model.


Part 1: Foundations (The Mental Model)

Traditional security tools are like smoke detectors — they alert you when there’s a pattern that looks dangerous. They scan for known signatures, misconfigurations, and fingerprints. They produce long lists of “potential issues.” Then a human has to figure out which ones actually matter.

Shannon is more like a red team on-demand. It thinks like a human pen tester:

  1. Maps your application’s attack surface by reading your source code and live-crawling the app.
  2. Theorizes which code paths could be exploited (injection sinks, auth bypass vectors, SSRF triggers).
  3. Proves it — actually fires the exploit at a real running instance using a built-in browser and CLI tools.
  4. Reports only confirmed findings with copy-paste Proof-of-Concept payloads.

The key mental shift: “No Exploit, No Report.” If Shannon can’t prove a vulnerability is exploitable, it discards it. This eliminates the crushing false-positive noise that plagues traditional scanners.

The Mental Model: Shannon = White-Box Code Analysis + Black-Box Browser Exploitation + “No Exploit, No Report” Policy.


Part 2: The Investigation

Why This Exists

The core problem Shannon solves is what you might call the “vibe-coding security gap.”

Thanks to tools like Claude Code and Cursor, modern dev teams ship features at a blistering pace. But their annual penetration test? That’s a one-time snapshot. For the other 364 days of the year, every new feature, every refactor, every dependency update is an untested attack surface shipping straight to production.

Shannon is designed to be the continuous red team running in parallel with your continuous delivery.

The 4-Phase Multi-Agent Pipeline

Shannon’s architecture emulates what a human pen tester actually does, broken into four distinct phases:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
          ┌──────────────────────┐
          │    Reconnaissance    │
          └──────────┬───────────┘
         ┌───────────┼────────────┐
         ▼           ▼            ▼
  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
  │Vuln Analysis│ │Vuln Analysis│ │    ...      │
  │ (Injection) │ │   (XSS)     │ │             │
  └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
         ▼               ▼               ▼
  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
  │Exploitation │ │Exploitation │ │    ...      │
  │ (Injection) │ │   (XSS)     │ │             │
  └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
         └───────────────┴───────────────┘
             ┌──────────────────────┐
             │       Reporting      │
             └──────────────────────┘

Phase 1 — Reconnaissance: Shannon reads your source code with the context of infrastructure tools like Nmap and Subfinder, building a complete attack surface map: every endpoint, API route, authentication flow, and technology fingerprint.

Phase 2 — Vulnerability Analysis (Parallel): Specialized agents for each OWASP category run in parallel. Injection agents perform structured data flow analysis — tracing user input from entry points all the way to dangerous sinks (database queries, shell commands, HTTP redirects). This produces a ranked list of “hypothesized exploitable paths.”

Phase 3 — Exploitation (Parallel): Dedicated exploit agents receive the hypotheses and attempt to prove them real. They use browser automation, command-line tools, and custom exploit scripts. Only successful exploits survive.

Phase 4 — Reporting: A final agent compiles only the confirmed, proven findings into a professional, actionable report — cleaned of any hallucinated artifacts — with reproducible PoCs.

Powered by Real Security Tools

Shannon isn’t reinventing the wheel. It integrates established security tooling into its reconnaissance pipeline:

ToolRole
NmapNetwork port scanning & service fingerprinting
SubfinderSubdomain enumeration
WhatWebTechnology stack identification
SchemathesisAPI schema-based fuzzing

Part 3: The Diagnosis

What Shannon Actually Finds

Shannon’s current coverage (v1 Lite) focuses on the highest-impact, most provably exploitable vulnerability classes:

  • Injection (SQL Injection, Command Injection, Server-Side Template Injection)
  • Broken Authentication & Authorization (auth bypass, privilege escalation, IDOR, JWT attacks)
  • Cross-Site Scripting (Reflected, Stored, DOM-based)
  • Server-Side Request Forgery (SSRF)

Against industry-standard intentionally vulnerable apps, the results are striking:

OWASP Juice Shop: 20+ high-impact vulnerabilities found, including:

  • Complete authentication bypass + full user database exfiltration via SQL Injection
  • Privilege escalation to admin via registration workflow bypass
  • IDOR exposing any user’s private data
  • SSRF enabling internal network reconnaissance

c{api}tal API (Checkmarx): ~15 critical/high vulnerabilities, including:

  • Root-level command injection via a hidden debug endpoint
  • Auth bypass targeting a legacy unpatched v1 API
  • Mass Assignment escalating a regular user to admin

OWASP crAPI: 15+ vulnerabilities including advanced JWT attacks (Algorithm Confusion, alg:none, weak kid injection).

Handling Authenticated Testing & 2FA

One of Shannon’s practical strengths is its ability to handle authenticated test runs. You can provide credentials (including TOTP secrets for 2FA apps) in a yaml config file, and the AI will handle the login flow autonomously — including “Sign in with Google” OAuth flows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# configs/my-app-config.yaml
authentication:
  login_type: form
  login_url: "https://your-app.com/login"
  credentials:
    username: "[email protected]"
    password: "Test1234!"
    totp_secret: "LB2E2RX7XFHSTGCK"   # 2FA? No problem.

  login_flow:
    - "Type $username into the email field"
    - "Type $password into the password field"
    - "Click the 'Sign In' button"

  success_condition:
    type: url_contains
    value: "/dashboard"

rules:
  avoid:
    - description: "Do not test the logout endpoint"
      type: path
      url_path: "/logout"
  focus:
    - description: "Prioritize testing API endpoints"
      type: path
      url_path: "/api"

This is perfect for staging environments where your app sits behind a login gate.

Workspace Resumability

Shannon supports named workspaces that checkpoint progress via git commits. If a run is interrupted mid-way through exploitation, you can resume from the last successful phase without re-running hours of reconnaissance:

1
2
3
4
5
# Start with a named workspace
./shannon start URL=https://staging.myapp.com REPO=myapp WORKSPACE=q2-audit

# Resume the same run (skips completed agents automatically)
./shannon start URL=https://staging.myapp.com REPO=myapp WORKSPACE=q2-audit

Part 4: The Resolution

Getting Shannon running is straightforward — it’s entirely Docker-based, so there are no system-level dependencies to manage.

Step-by-Step Setup

1. Clone and configure:

1
2
3
4
5
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# Put your target repo inside ./repos/
git clone https://github.com/your-org/your-app.git ./repos/your-app

2. Set your API key:

1
2
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000

3. Run the pentest:

1
./shannon start URL=https://staging.your-app.com REPO=your-app

4. Monitor progress:

1
2
3
4
5
# Real-time logs
./shannon logs

# Or open the Temporal workflow UI
open http://localhost:8233

5. Find your report:

Results are saved to ./audit-logs/{hostname}_{sessionId}/deliverables/comprehensive_security_assessment_report.md — a professional, pentest-grade report with verified findings only.

Important Considerations

  • Time & Cost: A full test run takes approximately 1–1.5 hours and costs roughly $50 USD in Claude API credits using Sonnet.
  • Staging only: Shannon executes actual attacks. It will create users, modify data, and trigger database queries. Never point it at production.
  • White-box only: Shannon Lite requires access to your source code. It uses code analysis to intelligently guide attacks — not blind fuzzing.

Final Mental Model

Tool TypeApproachStrength
Traditional ScannerSignature-basedFast, cheap, lots of false positives
PentAGI-style toolsMulti-agent, broad toolingComprehensive, model-agnostic
ShannonWhite-box code analysis + Black-box exploitationMinimal false positives, proven exploits only

Shannon represents a new category: the continuous autonomous pen tester. By coupling source code understanding with real exploit execution and enforcing a “no exploit, no report” contract, it brings enterprise-grade security validation to every team that ships code continuously.

If you’re a developer building anything that touches user data, authentication, or external APIs, Shannon deserves a place in your staging pipeline.

Star it on GitHub →

Made with laziness love 🦥

Subscribe to My Newsletter