realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

GLM-5.1 by Z.ai ↗

General-Purpose LLM · agentic-v1 · scored on 25/26 repositories. Strict scoring (unfinished repos counted as misses).

57.1
F3 (strict)
58.6
F2 (strict)
55.7%
Recall (strict)
74.1%
Precision
25/26
Repos scored
glm-5.1
Model
$10
Total cost
438s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
pythonssti100 F2 · 100%
vfapi90 F2 · 96%
damn-vulnerable-flask-application88 F2 · 90%
python-insecure-app83 F2 · 81%
dsvpwa78 F2 · 77%
vulnerable-api77 F2 · 79%
insecure-web77 F2 · 78%
vulnerable-tornado-app76 F2 · 74%
dsvw74 F2 · 72%
python-app70 F2 · 70%
dvblab70 F2 · 70%
intentionally-vulnerable-python-application68 F2 · 71%
vulnpy65 F2 · 62%
vulnerable-flask-app64 F2 · 62%
lets-be-bad-guys64 F2 · 60%
threatbyte62 F2 · 59%
dvpwa60 F2 · 58%
owasp-web-playground59 F2 · 57%
vulnerable-python-apps57 F2 · 55%
pygoat56 F2 · 53%
extremely-vulnerable-flask-app49 F2 · 45%
damn-vulnerable-graphql-application48 F2 · 47%
flask-xss45 F2 · 40%
djangoat40 F2 · 37%
vulpy38 F2 · 34%
RepositoryTPFPFNRecall %F2
pythonssti200100.0100.0
vfapi94096.390.0
damn-vulnerable-flask-application144290.087.7
python-insecure-app60281.283.2
dsvpwa256777.177.7
vulnerable-api114378.676.9
insecure-web73277.876.7
vulnerable-tornado-app102473.875.6
dsvw205872.273.6
python-app146670.070.0
dvblab157769.769.6
intentionally-vulnerable-python-application54271.467.8
vulnpy48113061.564.7
vulnerable-flask-app134861.964.1
lets-be-bad-guys1421059.764.0
threatbyte1541159.062.4
dvpwa136957.659.6
owasp-web-playground1681257.159.0
vulnerable-python-apps1251054.557.0
pygoat40143652.655.8
extremely-vulnerable-flask-app1441845.349.3
damn-vulnerable-graphql-application17151947.248.1
flask-xss1211840.045.0
djangoat18103237.040.5
vulpy1973833.938.0
§

Detection by severity

SeverityTPFPFNRecall %
Critical761989.4
High155210360.1
Medium133014048.7
Low2404236.4
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
Code Injection / RFI1400100.0
XML External Entities810100.0
HTTP Header Injection200100.0
XPath Injection400100.0
SQL Injection440295.7
Insecure Deserialization180194.7
Path Traversal230388.5
Command / OS Injection150288.2
Open Redirect50183.3
Cross-Site Scripting5602668.3
Broken Access Control / IDOR150768.2
Hardcoded Credentials3802263.3
Server-Side Request Forgery1311154.2
Missing Authentication / Authorization2202348.9
Security Misconfiguration1601748.5
Other75112537.5
Sensitive Data Exposure1903634.5
Denial of Service10185.3
§

LLM operational metrics

32,263
Avg input tokens
9,004
Avg output tokens
128,705
Avg total tokens
438s
Avg latency / repo
1.3%
JSON repair rate
78
Total runs
±15.6
F2 run-to-run σ
§

Cost

$10
Total cost
$0.16
Cost / run
$0.053
Cost / 100 LOC
19,610
Python LOC scanned
63
Successful runs

← Back to the leaderboard