realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

GLM-5 by Z.ai ↗

General-Purpose LLM · agentic-v1 · scored on 22/26 repositories. Strict scoring (unfinished repos counted as misses).

45.1
F3 (strict)
47.2
F2 (strict)
43.1%
Recall (strict)
76.7%
Precision
22/26
Repos scored
glm-5
Model
$7
Total cost
409s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vfapi83 F2 · 89%
vulnpy74 F2 · 70%
insecure-web71 F2 · 74%
dsvpwa69 F2 · 66%
vulnerable-flask-app68 F2 · 63%
python-app67 F2 · 65%
dvblab65 F2 · 61%
vulnerable-tornado-app64 F2 · 62%
dsvw63 F2 · 58%
intentionally-vulnerable-python-application62 F2 · 64%
vulnerable-api60 F2 · 57%
damn-vulnerable-flask-application57 F2 · 53%
lets-be-bad-guys56 F2 · 51%
python-insecure-app56 F2 · 50%
pythonssti56 F2 · 50%
flask-xss50 F2 · 46%
threatbyte49 F2 · 45%
damn-vulnerable-graphql-application44 F2 · 44%
pygoat38 F2 · 35%
dvpwa37 F2 · 34%
djangoat35 F2 · 31%
vulpy29 F2 · 25%
RepositoryTPFPFNRecall %F2
vfapi84188.983.1
vulnpy5542370.173.6
insecure-web74274.170.9
dsvpwa2131165.669.2
vulnerable-flask-app131863.567.8
python-app133765.067.4
dvblab142861.465.2
vulnerable-tornado-app93561.964.1
dsvw1611158.062.8
intentionally-vulnerable-python-application44264.361.6
vulnerable-api82657.160.3
damn-vulnerable-flask-application82753.357.0
lets-be-bad-guys1231251.455.6
python-insecure-app40450.055.6
pythonssti10150.055.6
flask-xss1431645.649.8
threatbyte1231444.949.0
damn-vulnerable-graphql-application16222044.444.0
pygoat27115035.138.5
dvpwa861434.137.2
djangoat1673431.335.2
vulpy1434325.129.2
§

Detection by severity

SeverityTPFPFNRecall %
Critical690889.6
High120010453.6
Medium97014839.6
Low1404025.9
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
Denial of Service1800100.0
Insecure Deserialization1500100.0
XPath Injection400100.0
SQL Injection350197.2
Code Injection / RFI130192.9
Command / OS Injection150288.2
XML External Entities70187.5
Open Redirect50183.3
Path Traversal180772.0
Server-Side Request Forgery150671.4
Hardcoded Credentials2902059.2
HTTP Header Injection10150.0
Broken Access Control / IDOR801044.4
Missing Authentication / Authorization1702342.5
Cross-Site Scripting2904539.2
Security Misconfiguration1001835.7
Other52012529.4
Sensitive Data Exposure903918.8
§

LLM operational metrics

57,126
Avg input tokens
4,789
Avg output tokens
123,606
Avg total tokens
409s
Avg latency / repo
1.4%
JSON repair rate
72
Total runs
±13.8
F2 run-to-run σ
§

Cost

$7
Total cost
$0.11
Cost / run
$0.034
Cost / 100 LOC
19,157
Python LOC scanned
58
Successful runs

← Back to the leaderboard