realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Claude Fable 5 by Anthropic ↗

General-Purpose LLM · claude-code-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).

Methodology note
Different harness. Every other LLM scanner here runs agentically through the OpenCode CLI (version label agentic-v1). Fable 5 could not be benchmarked that way: the OpenCode→Anthropic API path was consistently blocked by provider content filtering on the intentionally-vulnerable source, returning refusals instead of findings.

Instead, each of the 26 repositories was scanned by a dedicated Claude Code subagent (version label claude-code-v1) using the identical system prompt as the agentic runner (prompt hash sha256:14ccb06a286c), so findings remain comparable. The same prompt ran cleanly through Claude Code, which confirms the block was specific to the OpenCode delivery path — not the prompt or the model.

Caveats. These runs were interactive rather than metered, so token and latency figures were not recorded. The cost shown is an estimate: Fable 5's API price is exactly 2× Claude Opus 4.8 ($10/$50 vs $5/$25 per 1M input/output tokens), so we project its cost as 2× Opus 4.8's measured cost on the same benchmark. One repository (python-app) nests its source under a target/ directory; the agent reported paths without that prefix, which were normalized to align with ground truth before scoring.
50.5
F3 (strict)
52.5
F2 (strict)
48.6%
Recall (strict)
76.5%
Precision
26/26
Repos scored
Model
~$71 est.
Total cost
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vfapi83 F2 · 89%
insecure-web81 F2 · 78%
dsvw77 F2 · 74%
intentionally-vulnerable-python-application74 F2 · 71%
dvblab71 F2 · 68%
vulnerable-python-apps71 F2 · 73%
vulnerable-tornado-app68 F2 · 64%
python-app66 F2 · 65%
vulnerable-api66 F2 · 64%
vulnpy66 F2 · 60%
vampi58 F2 · 60%
dsvpwa58 F2 · 53%
damn-vulnerable-flask-application56 F2 · 53%
python-insecure-app54 F2 · 50%
lets-be-bad-guys54 F2 · 50%
owasp-web-playground52 F2 · 50%
pythonssti50 F2 · 50%
vulnerable-flask-app50 F2 · 48%
pygoat48 F2 · 44%
threatbyte46 F2 · 42%
damn-vulnerable-graphql-application40 F2 · 36%
extremely-vulnerable-flask-app39 F2 · 34%
dvpwa36 F2 · 32%
flask-xss34 F2 · 30%
vulpy32 F2 · 28%
djangoat32 F2 · 28%
RepositoryTPFPFNRecall %F2
vfapi84188.983.3
insecure-web70277.881.4
dsvw202774.176.9
intentionally-vulnerable-python-application51271.473.5
dvblab152768.271.4
vulnerable-python-apps169672.770.8
vulnerable-tornado-app91564.368.2
python-app135765.066.3
vulnerable-api93564.366.2
vulnpy4703160.365.5
vampi98660.058.4
dsvpwa1721553.157.8
damn-vulnerable-flask-application84753.355.6
python-insecure-app41450.054.1
lets-be-bad-guys1241250.053.6
owasp-web-playground14101450.051.5
pythonssti11150.050.0
vulnerable-flask-app1071147.649.5
pygoat34104344.248.3
threatbyte1151542.345.8
damn-vulnerable-graphql-application1352336.140.1
extremely-vulnerable-flask-app1132134.438.7
dvpwa721531.836.1
flask-xss952130.033.6
vulpy1644128.132.3
djangoat1463628.031.8
§

Detection by severity

SeverityTPFPFNRecall %
Critical7431286.0
High142512253.8
Medium105117437.6
Low1805026.5
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
Code Injection / RFI1400100.0
XML External Entities810100.0
Insecure Deserialization1900100.0
HTTP Header Injection200100.0
XPath Injection400100.0
SQL Injection432491.5
Path Traversal221484.6
Command / OS Injection141382.4
Open Redirect40266.7
Hardcoded Credentials3802362.3
Broken Access Control / IDOR1301154.2
Security Misconfiguration1701651.5
Server-Side Request Forgery1201250.0
Other75313136.4
Missing Authentication / Authorization1603134.0
Cross-Site Scripting2515730.5
Sensitive Data Exposure1004717.5
Denial of Service301715.0
§

Cost

~$71 est.
Total cost
$0.93
Cost / run
$0.355
Cost / 100 LOC
20,062
Python LOC scanned
77
Successful runs

← Back to the leaderboard