realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Claude Opus 4.6 by Anthropic ↗

General-Purpose LLM · agentic-v1 · scored on 19/26 repositories. Strict scoring (unfinished repos counted as misses).

47.2
F3 (strict)
49.4
F2 (strict)
45.1%
Recall (strict)
79.9%
Precision
19/26
Repos scored
claude-opus-4-6
Model
$22
Total cost
763s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vfapi85 F2 · 89%
python-app84 F2 · 83%
python-insecure-app79 F2 · 75%
lets-be-bad-guys79 F2 · 75%
damn-vulnerable-flask-application77 F2 · 78%
insecure-web74 F2 · 78%
intentionally-vulnerable-python-application74 F2 · 71%
vulnpy74 F2 · 71%
vulnerable-api73 F2 · 71%
vampi72 F2 · 71%
vulnerable-flask-app67 F2 · 65%
vulnerable-tornado-app66 F2 · 64%
threatbyte62 F2 · 60%
extremely-vulnerable-flask-app54 F2 · 50%
vulpy51 F2 · 46%
pygoat49 F2 · 45%
djangoat47 F2 · 42%
damn-vulnerable-graphql-application46 F2 · 43%
flask-xss45 F2 · 40%
RepositoryTPFPFNRecall %F2
vfapi83188.985.1
python-app173383.383.6
python-insecure-app60275.078.9
lets-be-bad-guys180675.078.7
damn-vulnerable-flask-application124377.877.0
insecure-web74277.874.5
intentionally-vulnerable-python-application51271.473.7
vulnpy56102271.473.6
vulnerable-api102471.473.2
vampi114471.171.7
vulnerable-flask-app145765.166.7
vulnerable-tornado-app94564.365.7
threatbyte1661060.362.5
extremely-vulnerable-flask-app1631650.054.4
vulpy2623145.650.7
pygoat34124244.848.6
djangoat2142942.046.7
damn-vulnerable-graphql-application16102043.045.6
flask-xss1221840.044.8
§

Detection by severity

SeverityTPFPFNRecall %
Critical570887.7
High13118161.8
Medium111010651.2
Low2002841.7
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
Code Injection / RFI1300100.0
SQL Injection2900100.0
Insecure Deserialization1300100.0
Open Redirect300100.0
HTTP Header Injection100100.0
XPath Injection300100.0
Server-Side Request Forgery190195.0
Command / OS Injection130192.9
Hardcoded Credentials400687.0
Path Traversal181481.8
Broken Access Control / IDOR160480.0
XML External Entities50271.4
Cross-Site Scripting4202463.6
Missing Authentication / Authorization1802047.4
Security Misconfiguration901439.1
Sensitive Data Exposure1702837.8
Other58010236.2
Denial of Service201710.5
§

LLM operational metrics

7
Avg input tokens
4,608
Avg output tokens
176,970
Avg total tokens
763s
Avg latency / repo
0.0%
JSON repair rate
72
Total runs
±13.6
F2 run-to-run σ
§

Cost

$22
Total cost
$0.49
Cost / run
$0.123
Cost / 100 LOC
18,251
Python LOC scanned
46
Successful runs

← Back to the leaderboard