Welcome to another GitHub deep dive! Today we’re exploring Shannon, an open-source, fully autonomous AI pentester built by the team at Keygraph.
Shannon’s mission is blunt: break your web app before someone else does. It doesn’t generate a list of potential warnings — it executes real-world exploits and only reports what it can actually prove is vulnerable.
With a 96.15% success rate on a hint-free XBOW benchmark out of the box, Shannon is setting a new bar for what automated security tooling can achieve.
Let’s explore it using our Mental Model.
Part 1: Foundations (The Mental Model)
Traditional security tools are like smoke detectors — they alert you when there’s a pattern that looks dangerous. They scan for known signatures, misconfigurations, and fingerprints. They produce long lists of “potential issues.” Then a human has to figure out which ones actually matter.
Shannon is more like a red team on-demand. It thinks like a human pen tester:
- Maps your application’s attack surface by reading your source code and live-crawling the app.
- Theorizes which code paths could be exploited (injection sinks, auth bypass vectors, SSRF triggers).
- Proves it — actually fires the exploit at a real running instance using a built-in browser and CLI tools.
- Reports only confirmed findings with copy-paste Proof-of-Concept payloads.
The key mental shift: “No Exploit, No Report.” If Shannon can’t prove a vulnerability is exploitable, it discards it. This eliminates the crushing false-positive noise that plagues traditional scanners.
The Mental Model: Shannon = White-Box Code Analysis + Black-Box Browser Exploitation + “No Exploit, No Report” Policy.
Part 2: The Investigation
Why This Exists
The core problem Shannon solves is what you might call the “vibe-coding security gap.”
Thanks to tools like Claude Code and Cursor, modern dev teams ship features at a blistering pace. But their annual penetration test? That’s a one-time snapshot. For the other 364 days of the year, every new feature, every refactor, every dependency update is an untested attack surface shipping straight to production.
Shannon is designed to be the continuous red team running in parallel with your continuous delivery.
The 4-Phase Multi-Agent Pipeline
Shannon’s architecture emulates what a human pen tester actually does, broken into four distinct phases:
| |
Phase 1 — Reconnaissance: Shannon reads your source code with the context of infrastructure tools like Nmap and Subfinder, building a complete attack surface map: every endpoint, API route, authentication flow, and technology fingerprint.
Phase 2 — Vulnerability Analysis (Parallel): Specialized agents for each OWASP category run in parallel. Injection agents perform structured data flow analysis — tracing user input from entry points all the way to dangerous sinks (database queries, shell commands, HTTP redirects). This produces a ranked list of “hypothesized exploitable paths.”
Phase 3 — Exploitation (Parallel): Dedicated exploit agents receive the hypotheses and attempt to prove them real. They use browser automation, command-line tools, and custom exploit scripts. Only successful exploits survive.
Phase 4 — Reporting: A final agent compiles only the confirmed, proven findings into a professional, actionable report — cleaned of any hallucinated artifacts — with reproducible PoCs.
Powered by Real Security Tools
Shannon isn’t reinventing the wheel. It integrates established security tooling into its reconnaissance pipeline:
| Tool | Role |
|---|---|
| Nmap | Network port scanning & service fingerprinting |
| Subfinder | Subdomain enumeration |
| WhatWeb | Technology stack identification |
| Schemathesis | API schema-based fuzzing |
Part 3: The Diagnosis
What Shannon Actually Finds
Shannon’s current coverage (v1 Lite) focuses on the highest-impact, most provably exploitable vulnerability classes:
- Injection (SQL Injection, Command Injection, Server-Side Template Injection)
- Broken Authentication & Authorization (auth bypass, privilege escalation, IDOR, JWT attacks)
- Cross-Site Scripting (Reflected, Stored, DOM-based)
- Server-Side Request Forgery (SSRF)
Against industry-standard intentionally vulnerable apps, the results are striking:
OWASP Juice Shop: 20+ high-impact vulnerabilities found, including:
- Complete authentication bypass + full user database exfiltration via SQL Injection
- Privilege escalation to admin via registration workflow bypass
- IDOR exposing any user’s private data
- SSRF enabling internal network reconnaissance
c{api}tal API (Checkmarx): ~15 critical/high vulnerabilities, including:
- Root-level command injection via a hidden debug endpoint
- Auth bypass targeting a legacy unpatched v1 API
- Mass Assignment escalating a regular user to admin
OWASP crAPI: 15+ vulnerabilities including advanced JWT attacks (Algorithm Confusion, alg:none, weak kid injection).
Handling Authenticated Testing & 2FA
One of Shannon’s practical strengths is its ability to handle authenticated test runs. You can provide credentials (including TOTP secrets for 2FA apps) in a yaml config file, and the AI will handle the login flow autonomously — including “Sign in with Google” OAuth flows.
| |
This is perfect for staging environments where your app sits behind a login gate.
Workspace Resumability
Shannon supports named workspaces that checkpoint progress via git commits. If a run is interrupted mid-way through exploitation, you can resume from the last successful phase without re-running hours of reconnaissance:
| |
Part 4: The Resolution
Getting Shannon running is straightforward — it’s entirely Docker-based, so there are no system-level dependencies to manage.
Step-by-Step Setup
1. Clone and configure:
| |
2. Set your API key:
| |
3. Run the pentest:
| |
4. Monitor progress:
| |
5. Find your report:
Results are saved to ./audit-logs/{hostname}_{sessionId}/deliverables/comprehensive_security_assessment_report.md — a professional, pentest-grade report with verified findings only.
Important Considerations
- Time & Cost: A full test run takes approximately 1–1.5 hours and costs roughly $50 USD in Claude API credits using Sonnet.
- Staging only: Shannon executes actual attacks. It will create users, modify data, and trigger database queries. Never point it at production.
- White-box only: Shannon Lite requires access to your source code. It uses code analysis to intelligently guide attacks — not blind fuzzing.
Final Mental Model
| Tool Type | Approach | Strength |
|---|---|---|
| Traditional Scanner | Signature-based | Fast, cheap, lots of false positives |
| PentAGI-style tools | Multi-agent, broad tooling | Comprehensive, model-agnostic |
| Shannon | White-box code analysis + Black-box exploitation | Minimal false positives, proven exploits only |
Shannon represents a new category: the continuous autonomous pen tester. By coupling source code understanding with real exploit execution and enforcing a “no exploit, no report” contract, it brings enterprise-grade security validation to every team that ships code continuously.
If you’re a developer building anything that touches user data, authentication, or external APIs, Shannon deserves a place in your staging pipeline.
