realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Grok 3 by xAI ↗

General-Purpose LLM · agentic-v1 · scored on 21/26 repositories. Strict scoring (unfinished repos counted as misses).

21.0
F3 (strict)
22.9
F2 (strict)
19.3%
Recall (strict)
84.4%
Precision
21/26
Repos scored
xai/grok-3
Model
$5
Total cost
34s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vfapi71 F2 · 67%
insecure-web71 F2 · 67%
dsvpwa69 F2 · 66%
pythonssti56 F2 · 50%
vampi55 F2 · 50%
dsvw52 F2 · 47%
dvblab49 F2 · 44%
vulnerable-api48 F2 · 43%
python-insecure-app47 F2 · 42%
lets-be-bad-guys44 F2 · 40%
python-app36 F2 · 32%
damn-vulnerable-flask-application36 F2 · 31%
vulnerable-tornado-app33 F2 · 29%
threatbyte26 F2 · 22%
vulnerable-flask-app26 F2 · 22%
flask-xss17 F2 · 14%
dvpwa13 F2 · 11%
djangoat11 F2 · 9%
vulpy11 F2 · 9%
pygoat8 F2 · 7%
vulnpy6 F2 · 5%
RepositoryTPFPFNRecall %F2
vfapi60366.771.3
insecure-web60366.770.9
dsvpwa2131165.669.1
pythonssti10150.055.6
vampi81850.054.8
dsvw1301446.952.5
dvblab1011243.949.0
vulnerable-api61842.947.9
python-insecure-app30541.747.1
lets-be-bad-guys1031439.643.8
python-app621431.735.7
damn-vulnerable-flask-application511031.135.5
vulnerable-tornado-app401028.633.3
threatbyte602021.825.8
vulnerable-flask-app531622.225.5
flask-xss412614.417.3
dvpwa202010.612.9
djangoat52459.311.3
vulpy51528.810.6
pygoat54726.98.4
vulnpy42745.16.2
§

Detection by severity

SeverityTPFPFNRecall %
Critical3603749.3
High56015626.4
Medium38019416.4
Low30525.5
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
SQL Injection2501071.4
Command / OS Injection90469.2
Open Redirect40266.7
Insecure Deserialization90564.3
HTTP Header Injection10150.0
Code Injection / RFI60842.9
XML External Entities30537.5
Path Traversal801633.3
Hardcoded Credentials1303527.1
XPath Injection10325.0
Cross-Site Scripting1405719.7
Missing Authentication / Authorization603016.7
Broken Access Control / IDOR301516.7
Server-Side Request Forgery301715.0
Security Misconfiguration402314.8
Other21014912.4
Denial of Service10175.6
Sensitive Data Exposure20424.5
§

LLM operational metrics

15,856
Avg input tokens
1,369
Avg output tokens
17,535
Avg total tokens
34s
Avg latency / repo
0.0%
JSON repair rate
72
Total runs
±21.3
F2 run-to-run σ
§

Cost

$5
Total cost
$0.08
Cost / run
$0.028
Cost / 100 LOC
17,556
Python LOC scanned
58
Successful runs

← Back to the leaderboard