realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Claude Sonnet 4.6 by Anthropic ↗

General-Purpose LLM · agentic-v1 · scored on 23/26 repositories. Strict scoring (unfinished repos counted as misses).

50.9
F3 (strict)
53.0
F2 (strict)
48.9%
Recall (strict)
79.7%
Precision
23/26
Repos scored
claude-sonnet-4-6
Model
$17
Total cost
367s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vampi80 F2 · 80%
vfapi80 F2 · 85%
dsvw78 F2 · 74%
python-app74 F2 · 72%
vulnpy71 F2 · 68%
insecure-web71 F2 · 70%
damn-vulnerable-flask-application70 F2 · 69%
dvblab70 F2 · 68%
vulnerable-api70 F2 · 69%
lets-be-bad-guys67 F2 · 62%
vulnerable-flask-app62 F2 · 59%
threatbyte61 F2 · 56%
intentionally-vulnerable-python-application61 F2 · 57%
dsvpwa58 F2 · 55%
vulnerable-tornado-app58 F2 · 57%
pythonssti56 F2 · 50%
extremely-vulnerable-flask-app55 F2 · 50%
dvpwa53 F2 · 50%
pygoat46 F2 · 42%
damn-vulnerable-graphql-application43 F2 · 39%
flask-xss41 F2 · 37%
djangoat38 F2 · 34%
vulpy36 F2 · 32%
RepositoryTPFPFNRecall %F2
vampi123380.080.0
vfapi84185.279.8
dsvw201774.177.7
python-app143671.773.9
vulnpy5352567.571.3
insecure-web62370.470.9
damn-vulnerable-flask-application103568.970.3
dvblab154768.270.1
vulnerable-api104469.069.7
lets-be-bad-guys151962.567.0
vulnerable-flask-app123958.761.8
threatbyte1521156.460.9
intentionally-vulnerable-python-application41357.160.6
dsvpwa1851455.258.5
vulnerable-tornado-app85657.158.0
pythonssti10150.055.6
extremely-vulnerable-flask-app1601650.055.4
dvpwa1141150.053.4
pygoat33114442.446.5
damn-vulnerable-graphql-application1462238.942.6
flask-xss1121936.741.4
djangoat1793334.037.7
vulpy1893932.235.9
§

Detection by severity

SeverityTPFPFNRecall %
Critical750692.6
High141010058.5
Medium111114643.2
Low2203836.7
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
SQL Injection3900100.0
Insecure Deserialization1600100.0
Open Redirect600100.0
HTTP Header Injection200100.0
XPath Injection400100.0
Code Injection / RFI130192.9
Command / OS Injection150288.2
XML External Entities70187.5
Path Traversal200580.0
Hardcoded Credentials3711472.5
Broken Access Control / IDOR150768.2
Server-Side Request Forgery140766.7
Cross-Site Scripting4403357.1
Missing Authentication / Authorization2102248.8
Security Misconfiguration1201938.7
Other67012534.9
Sensitive Data Exposure1403727.5
Denial of Service301715.0
§

LLM operational metrics

10
Avg input tokens
5,709
Avg output tokens
232,970
Avg total tokens
367s
Avg latency / repo
0.0%
JSON repair rate
72
Total runs
±13.3
F2 run-to-run σ
§

Cost

$17
Total cost
$0.29
Cost / run
$0.083
Cost / 100 LOC
19,983
Python LOC scanned
58
Successful runs

← Back to the leaderboard