Scanner deep-dive
GLM-5 by Z.ai ↗
General-Purpose LLM · agentic-v1 · scored on 22/26 repositories. Strict scoring (unfinished repos counted as misses).
45.1
F3 (strict)
47.2
F2 (strict)
43.1%
Recall (strict)
76.7%
Precision
22/26
Repos scored
glm-5
Model
$7
Total cost
409s
Avg latency
§
Per-repository breakdown
Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.
| Repository | TP | FP | FN | Recall % | F2 |
|---|---|---|---|---|---|
| vfapi | 8 | 4 | 1 | 88.9 | 83.1 |
| vulnpy | 55 | 4 | 23 | 70.1 | 73.6 |
| insecure-web | 7 | 4 | 2 | 74.1 | 70.9 |
| dsvpwa | 21 | 3 | 11 | 65.6 | 69.2 |
| vulnerable-flask-app | 13 | 1 | 8 | 63.5 | 67.8 |
| python-app | 13 | 3 | 7 | 65.0 | 67.4 |
| dvblab | 14 | 2 | 8 | 61.4 | 65.2 |
| vulnerable-tornado-app | 9 | 3 | 5 | 61.9 | 64.1 |
| dsvw | 16 | 1 | 11 | 58.0 | 62.8 |
| intentionally-vulnerable-python-application | 4 | 4 | 2 | 64.3 | 61.6 |
| vulnerable-api | 8 | 2 | 6 | 57.1 | 60.3 |
| damn-vulnerable-flask-application | 8 | 2 | 7 | 53.3 | 57.0 |
| lets-be-bad-guys | 12 | 3 | 12 | 51.4 | 55.6 |
| python-insecure-app | 4 | 0 | 4 | 50.0 | 55.6 |
| pythonssti | 1 | 0 | 1 | 50.0 | 55.6 |
| flask-xss | 14 | 3 | 16 | 45.6 | 49.8 |
| threatbyte | 12 | 3 | 14 | 44.9 | 49.0 |
| damn-vulnerable-graphql-application | 16 | 22 | 20 | 44.4 | 44.0 |
| pygoat | 27 | 11 | 50 | 35.1 | 38.5 |
| dvpwa | 8 | 6 | 14 | 34.1 | 37.2 |
| djangoat | 16 | 7 | 34 | 31.3 | 35.2 |
| vulpy | 14 | 3 | 43 | 25.1 | 29.2 |
§
Detection by severity
| Severity | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Critical | 69 | 0 | 8 | 89.6 |
| High | 120 | 0 | 104 | 53.6 |
| Medium | 97 | 0 | 148 | 39.6 |
| Low | 14 | 0 | 40 | 25.9 |
§
Detection by vulnerability class
| CWE family | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Denial of Service | 18 | 0 | 0 | 100.0 |
| Insecure Deserialization | 15 | 0 | 0 | 100.0 |
| XPath Injection | 4 | 0 | 0 | 100.0 |
| SQL Injection | 35 | 0 | 1 | 97.2 |
| Code Injection / RFI | 13 | 0 | 1 | 92.9 |
| Command / OS Injection | 15 | 0 | 2 | 88.2 |
| XML External Entities | 7 | 0 | 1 | 87.5 |
| Open Redirect | 5 | 0 | 1 | 83.3 |
| Path Traversal | 18 | 0 | 7 | 72.0 |
| Server-Side Request Forgery | 15 | 0 | 6 | 71.4 |
| Hardcoded Credentials | 29 | 0 | 20 | 59.2 |
| HTTP Header Injection | 1 | 0 | 1 | 50.0 |
| Broken Access Control / IDOR | 8 | 0 | 10 | 44.4 |
| Missing Authentication / Authorization | 17 | 0 | 23 | 42.5 |
| Cross-Site Scripting | 29 | 0 | 45 | 39.2 |
| Security Misconfiguration | 10 | 0 | 18 | 35.7 |
| Other | 52 | 0 | 125 | 29.4 |
| Sensitive Data Exposure | 9 | 0 | 39 | 18.8 |
§
LLM operational metrics
57,126
Avg input tokens
4,789
Avg output tokens
123,606
Avg total tokens
409s
Avg latency / repo
1.4%
JSON repair rate
72
Total runs
±13.8
F2 run-to-run σ
§
Cost
$7
Total cost
$0.11
Cost / run
$0.034
Cost / 100 LOC
19,157
Python LOC scanned
58
Successful runs