Scanner deep-dive
Claude Fable 5 by Anthropic ↗
General-Purpose LLM · claude-code-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).
Methodology note
Different harness. Every other LLM scanner here runs agentically through the OpenCode CLI (version label agentic-v1). Fable 5 could not be benchmarked that way: the OpenCode→Anthropic API path was consistently blocked by provider content filtering on the intentionally-vulnerable source, returning refusals instead of findings. Instead, each of the 26 repositories was scanned by a dedicated Claude Code subagent (version label claude-code-v1) using the identical system prompt as the agentic runner (prompt hash sha256:14ccb06a286c), so findings remain comparable. The same prompt ran cleanly through Claude Code, which confirms the block was specific to the OpenCode delivery path — not the prompt or the model.
Caveats. These runs were interactive rather than metered, so token and latency figures were not recorded. The cost shown is an estimate: Fable 5's API price is exactly 2× Claude Opus 4.8 ($10/$50 vs $5/$25 per 1M input/output tokens), so we project its cost as 2× Opus 4.8's measured cost on the same benchmark. One repository (python-app) nests its source under a target/ directory; the agent reported paths without that prefix, which were normalized to align with ground truth before scoring.
50.5
F3 (strict)
52.5
F2 (strict)
48.6%
Recall (strict)
76.5%
Precision
26/26
Repos scored
—
Model
~$71 est.
Total cost
—
Avg latency
§
Per-repository breakdown
Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.
| Repository | TP | FP | FN | Recall % | F2 |
|---|---|---|---|---|---|
| vfapi | 8 | 4 | 1 | 88.9 | 83.3 |
| insecure-web | 7 | 0 | 2 | 77.8 | 81.4 |
| dsvw | 20 | 2 | 7 | 74.1 | 76.9 |
| intentionally-vulnerable-python-application | 5 | 1 | 2 | 71.4 | 73.5 |
| dvblab | 15 | 2 | 7 | 68.2 | 71.4 |
| vulnerable-python-apps | 16 | 9 | 6 | 72.7 | 70.8 |
| vulnerable-tornado-app | 9 | 1 | 5 | 64.3 | 68.2 |
| python-app | 13 | 5 | 7 | 65.0 | 66.3 |
| vulnerable-api | 9 | 3 | 5 | 64.3 | 66.2 |
| vulnpy | 47 | 0 | 31 | 60.3 | 65.5 |
| vampi | 9 | 8 | 6 | 60.0 | 58.4 |
| dsvpwa | 17 | 2 | 15 | 53.1 | 57.8 |
| damn-vulnerable-flask-application | 8 | 4 | 7 | 53.3 | 55.6 |
| python-insecure-app | 4 | 1 | 4 | 50.0 | 54.1 |
| lets-be-bad-guys | 12 | 4 | 12 | 50.0 | 53.6 |
| owasp-web-playground | 14 | 10 | 14 | 50.0 | 51.5 |
| pythonssti | 1 | 1 | 1 | 50.0 | 50.0 |
| vulnerable-flask-app | 10 | 7 | 11 | 47.6 | 49.5 |
| pygoat | 34 | 10 | 43 | 44.2 | 48.3 |
| threatbyte | 11 | 5 | 15 | 42.3 | 45.8 |
| damn-vulnerable-graphql-application | 13 | 5 | 23 | 36.1 | 40.1 |
| extremely-vulnerable-flask-app | 11 | 3 | 21 | 34.4 | 38.7 |
| dvpwa | 7 | 2 | 15 | 31.8 | 36.1 |
| flask-xss | 9 | 5 | 21 | 30.0 | 33.6 |
| vulpy | 16 | 4 | 41 | 28.1 | 32.3 |
| djangoat | 14 | 6 | 36 | 28.0 | 31.8 |
§
Detection by severity
| Severity | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Critical | 74 | 3 | 12 | 86.0 |
| High | 142 | 5 | 122 | 53.8 |
| Medium | 105 | 1 | 174 | 37.6 |
| Low | 18 | 0 | 50 | 26.5 |
§
Detection by vulnerability class
| CWE family | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Code Injection / RFI | 14 | 0 | 0 | 100.0 |
| XML External Entities | 8 | 1 | 0 | 100.0 |
| Insecure Deserialization | 19 | 0 | 0 | 100.0 |
| HTTP Header Injection | 2 | 0 | 0 | 100.0 |
| XPath Injection | 4 | 0 | 0 | 100.0 |
| SQL Injection | 43 | 2 | 4 | 91.5 |
| Path Traversal | 22 | 1 | 4 | 84.6 |
| Command / OS Injection | 14 | 1 | 3 | 82.4 |
| Open Redirect | 4 | 0 | 2 | 66.7 |
| Hardcoded Credentials | 38 | 0 | 23 | 62.3 |
| Broken Access Control / IDOR | 13 | 0 | 11 | 54.2 |
| Security Misconfiguration | 17 | 0 | 16 | 51.5 |
| Server-Side Request Forgery | 12 | 0 | 12 | 50.0 |
| Other | 75 | 3 | 131 | 36.4 |
| Missing Authentication / Authorization | 16 | 0 | 31 | 34.0 |
| Cross-Site Scripting | 25 | 1 | 57 | 30.5 |
| Sensitive Data Exposure | 10 | 0 | 47 | 17.5 |
| Denial of Service | 3 | 0 | 17 | 15.0 |
§
Cost
~$71 est.
Total cost
$0.93
Cost / run
$0.355
Cost / 100 LOC
20,062
Python LOC scanned
77
Successful runs