Scanner deep-dive

Claude Opus 4.7 by Anthropic ↗

General-Purpose LLM · agentic-v1 · scored on 25/26 repositories. Strict scoring (unfinished repos counted as misses).

47.5

F3 (strict)

49.4

F2 (strict)

45.8%

Recall (strict)

71.5%

Precision

25/26

Repos scored

claude-opus-4-7

Model

$32

Total cost

76s

Avg latency

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)

pythonssti100 F2 · 100%

vfapi84 F2 · 89%

intentionally-vulnerable-python-application82 F2 · 81%

vulnerable-tornado-app76 F2 · 74%

dsvw73 F2 · 70%

insecure-web73 F2 · 74%

python-app72 F2 · 72%

vulnerable-api72 F2 · 69%

vulnerable-python-apps70 F2 · 70%

vampi69 F2 · 67%

dvblab65 F2 · 62%

dsvpwa64 F2 · 62%

damn-vulnerable-flask-application62 F2 · 62%

python-insecure-app58 F2 · 54%

owasp-web-playground56 F2 · 55%

threatbyte55 F2 · 51%

pygoat54 F2 · 53%

vulnerable-flask-app54 F2 · 51%

lets-be-bad-guys52 F2 · 49%

extremely-vulnerable-flask-app48 F2 · 44%

djangoat42 F2 · 39%

damn-vulnerable-graphql-application40 F2 · 36%

dvpwa39 F2 · 35%

vulpy34 F2 · 30%

flask-xss25 F2 · 23%

Repository	TP	FP	FN	Recall %	F2
pythonssti	2	0	0	100.0	100.0
vfapi	8	4	1	88.9	83.9
intentionally-vulnerable-python-application	6	1	1	81.0	81.6
vulnerable-tornado-app	10	2	4	73.8	75.7
dsvw	19	3	8	70.4	73.3
insecure-web	7	3	2	74.1	73.0
python-app	14	6	6	72.5	72.5
vulnerable-api	10	2	4	69.0	71.9
vulnerable-python-apps	15	5	7	69.7	70.2
vampi	10	2	5	66.7	69.1
dvblab	14	3	8	62.1	65.0
dsvpwa	20	8	12	62.5	63.9
damn-vulnerable-flask-application	9	5	6	62.2	62.5
python-insecure-app	4	1	4	54.2	58.5
owasp-web-playground	16	10	12	55.4	56.2
threatbyte	13	3	13	51.3	55.2
pygoat	40	23	36	52.6	54.3
vulnerable-flask-app	11	5	10	50.8	53.5
lets-be-bad-guys	12	4	12	48.6	52.1
extremely-vulnerable-flask-app	14	4	18	43.8	47.9
djangoat	20	13	30	39.3	42.3
damn-vulnerable-graphql-application	13	6	23	36.1	39.8
dvpwa	8	2	14	34.8	39.4
vulpy	17	9	40	29.8	33.5
flask-xss	7	3	23	23.3	25.4

Detection by severity

Severity	TP	FP	FN	Recall %
Critical	69	0	7	90.8
High	134	1	103	56.5
Medium	109	0	139	44.0
Low	17	0	41	29.3

Detection by vulnerability class

CWE family	TP	FP	FN	Recall %
Code Injection / RFI	11	0	0	100.0
HTTP Header Injection	2	0	0	100.0
XPath Injection	1	0	0	100.0
SQL Injection	43	0	1	97.7
Insecure Deserialization	14	0	1	93.3
Command / OS Injection	13	0	1	92.9
Path Traversal	15	0	3	83.3
Open Redirect	5	0	1	83.3
Server-Side Request Forgery	9	0	2	81.8
XML External Entities	4	1	1	80.0
Broken Access Control / IDOR	18	0	6	75.0
Hardcoded Credentials	43	0	18	70.5
Security Misconfiguration	18	0	13	58.1
Missing Authentication / Authorization	21	0	26	44.7
Cross-Site Scripting	26	0	44	37.1
Other	69	0	129	34.8
Sensitive Data Exposure	16	0	41	28.1
Denial of Service	1	0	3	25.0

LLM operational metrics

Avg input tokens

5,440

Avg output tokens

287,918

Avg total tokens

76s

Avg latency / repo

0.0%

JSON repair rate

Total runs

±17.2

F2 run-to-run σ

Cost

$32

Total cost

$0.49

Cost / run

$0.184

Cost / 100 LOC

17,572

Python LOC scanned

Successful runs

← Back to the leaderboard