realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Grok 4.20 Reasoning by xAI ↗

General-Purpose LLM · agentic-v1 · scored on 24/26 repositories. Strict scoring (unfinished repos counted as misses).

27.7
F3 (strict)
30.0
F2 (strict)
25.7%
Recall (strict)
93.2%
Precision
24/26
Repos scored
xai/grok-4.20-reasoning-latest
Model
$17
Total cost
34s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
dsvpwa69 F2 · 66%
insecure-web61 F2 · 56%
vfapi60 F2 · 56%
intentionally-vulnerable-python-application56 F2 · 52%
pythonssti56 F2 · 50%
python-insecure-app47 F2 · 42%
dsvw46 F2 · 41%
dvblab46 F2 · 41%
vulnerable-api46 F2 · 40%
vulnerable-tornado-app44 F2 · 38%
damn-vulnerable-flask-application41 F2 · 36%
python-app40 F2 · 35%
vampi38 F2 · 33%
extremely-vulnerable-flask-app38 F2 · 33%
vulnpy35 F2 · 34%
lets-be-bad-guys28 F2 · 24%
vulnerable-flask-app28 F2 · 24%
dvpwa27 F2 · 23%
threatbyte26 F2 · 22%
damn-vulnerable-graphql-application22 F2 · 19%
flask-xss20 F2 · 17%
djangoat19 F2 · 16%
pygoat10 F2 · 9%
vulpy10 F2 · 8%
RepositoryTPFPFNRecall %F2
dsvpwa2131165.669.1
insecure-web50455.661.0
vfapi50455.660.5
intentionally-vulnerable-python-application41352.456.0
pythonssti10150.055.6
python-insecure-app30541.746.6
dsvw1101640.746.2
dvblab901340.946.1
vulnerable-api60840.545.9
vulnerable-tornado-app50938.143.5
damn-vulnerable-flask-application501035.640.6
python-app701335.040.0
vampi501033.338.2
extremely-vulnerable-flask-app1112133.338.0
vulnpy2755134.235.2
lets-be-bad-guys601823.627.9
vulnerable-flask-app511623.827.9
dvpwa501722.726.9
threatbyte602021.825.8
damn-vulnerable-graphql-application712918.522.0
flask-xss502516.720.0
djangoat804216.019.2
pygoat70708.710.5
vulpy51528.210.0
§

Detection by severity

SeverityTPFPFNRecall %
Critical4603656.1
High68017528.0
Medium35022513.5
Low20603.2
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
SQL Injection330684.6
Open Redirect50183.3
Command / OS Injection110664.7
Insecure Deserialization100662.5
HTTP Header Injection10150.0
Path Traversal1101444.0
XML External Entities30537.5
Hardcoded Credentials1503828.3
XPath Injection10325.0
Server-Side Request Forgery501722.7
Code Injection / RFI301121.4
Security Misconfiguration602618.8
Broken Access Control / IDOR401818.2
Cross-Site Scripting1106813.9
Other26016713.5
Missing Authentication / Authorization503811.6
Sensitive Data Exposure10511.9
Denial of Service00200.0
§

LLM operational metrics

110,646
Avg input tokens
2,042
Avg output tokens
112,688
Avg total tokens
34s
Avg latency / repo
0.0%
JSON repair rate
72
Total runs
±16.0
F2 run-to-run σ
§

Cost

$17
Total cost
$0.23
Cost / run
$0.084
Cost / 100 LOC
20,062
Python LOC scanned
72
Successful runs

← Back to the leaderboard