Scanner deep-dive

Claude Fable 5 by Anthropic ↗

General-Purpose LLM · claude-code-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).

Methodology note

Different harness. Every other LLM scanner here runs agentically through the OpenCode CLI (version label agentic-v1). Fable 5 could not be benchmarked that way: the OpenCode→Anthropic API path was consistently blocked by provider content filtering on the intentionally-vulnerable source, returning refusals instead of findings.

Instead, each of the 26 repositories was scanned by a dedicated Claude Code subagent (version label claude-code-v1) using the identical system prompt as the agentic runner (prompt hash sha256:14ccb06a286c), so findings remain comparable. The same prompt ran cleanly through Claude Code, which confirms the block was specific to the OpenCode delivery path — not the prompt or the model.

Caveats. These runs were interactive rather than metered, so token and latency figures were not recorded. The cost shown is an estimate: Fable 5's API price is exactly 2× Claude Opus 4.8 ($10/$50 vs $5/$25 per 1M input/output tokens), so we project its cost as 2× Opus 4.8's measured cost on the same benchmark. One repository (python-app) nests its source under a target/ directory; the agent reported paths without that prefix, which were normalized to align with ground truth before scoring.

50.5

F3 (strict)

52.5

F2 (strict)

48.6%

Recall (strict)

76.5%

Precision

26/26

Repos scored

—

Model

~$71 est.

Total cost

—

Avg latency

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)

vfapi83 F2 · 89%

insecure-web81 F2 · 78%

dsvw77 F2 · 74%

intentionally-vulnerable-python-application74 F2 · 71%

dvblab71 F2 · 68%

vulnerable-python-apps71 F2 · 73%

vulnerable-tornado-app68 F2 · 64%

python-app66 F2 · 65%

vulnerable-api66 F2 · 64%

vulnpy66 F2 · 60%

vampi58 F2 · 60%

dsvpwa58 F2 · 53%

damn-vulnerable-flask-application56 F2 · 53%

python-insecure-app54 F2 · 50%

lets-be-bad-guys54 F2 · 50%

owasp-web-playground52 F2 · 50%

pythonssti50 F2 · 50%

vulnerable-flask-app50 F2 · 48%

pygoat48 F2 · 44%

threatbyte46 F2 · 42%

damn-vulnerable-graphql-application40 F2 · 36%

extremely-vulnerable-flask-app39 F2 · 34%

dvpwa36 F2 · 32%

flask-xss34 F2 · 30%

vulpy32 F2 · 28%

djangoat32 F2 · 28%

Repository	TP	FP	FN	Recall %	F2
vfapi	8	4	1	88.9	83.3
insecure-web	7	0	2	77.8	81.4
dsvw	20	2	7	74.1	76.9
intentionally-vulnerable-python-application	5	1	2	71.4	73.5
dvblab	15	2	7	68.2	71.4
vulnerable-python-apps	16	9	6	72.7	70.8
vulnerable-tornado-app	9	1	5	64.3	68.2
python-app	13	5	7	65.0	66.3
vulnerable-api	9	3	5	64.3	66.2
vulnpy	47	0	31	60.3	65.5
vampi	9	8	6	60.0	58.4
dsvpwa	17	2	15	53.1	57.8
damn-vulnerable-flask-application	8	4	7	53.3	55.6
python-insecure-app	4	1	4	50.0	54.1
lets-be-bad-guys	12	4	12	50.0	53.6
owasp-web-playground	14	10	14	50.0	51.5
pythonssti	1	1	1	50.0	50.0
vulnerable-flask-app	10	7	11	47.6	49.5
pygoat	34	10	43	44.2	48.3
threatbyte	11	5	15	42.3	45.8
damn-vulnerable-graphql-application	13	5	23	36.1	40.1
extremely-vulnerable-flask-app	11	3	21	34.4	38.7
dvpwa	7	2	15	31.8	36.1
flask-xss	9	5	21	30.0	33.6
vulpy	16	4	41	28.1	32.3
djangoat	14	6	36	28.0	31.8

Detection by severity

Severity	TP	FP	FN	Recall %
Critical	74	3	12	86.0
High	142	5	122	53.8
Medium	105	1	174	37.6
Low	18	0	50	26.5

Detection by vulnerability class

CWE family	TP	FP	FN	Recall %
Code Injection / RFI	14	0	0	100.0
XML External Entities	8	1	0	100.0
Insecure Deserialization	19	0	0	100.0
HTTP Header Injection	2	0	0	100.0
XPath Injection	4	0	0	100.0
SQL Injection	43	2	4	91.5
Path Traversal	22	1	4	84.6
Command / OS Injection	14	1	3	82.4
Open Redirect	4	0	2	66.7
Hardcoded Credentials	38	0	23	62.3
Broken Access Control / IDOR	13	0	11	54.2
Security Misconfiguration	17	0	16	51.5
Server-Side Request Forgery	12	0	12	50.0
Other	75	3	131	36.4
Missing Authentication / Authorization	16	0	31	34.0
Cross-Site Scripting	25	1	57	30.5
Sensitive Data Exposure	10	0	47	17.5
Denial of Service	3	0	17	15.0

Cost

~$71 est.

Total cost

$0.93

Cost / run

$0.355

Cost / 100 LOC

20,062

Python LOC scanned

Successful runs

← Back to the leaderboard