Scanner deep-dive
Minimax M2.7 by MiniMax ↗
General-Purpose LLM · agentic-v1 · scored on 22/26 repositories. Strict scoring (unfinished repos counted as misses).
38.2
F3 (strict)
40.2
F2 (strict)
36.3%
Recall (strict)
71.3%
Precision
22/26
Repos scored
MiniMax-M2.7
Model
$1
Total cost
119s
Avg latency
§
Per-repository breakdown
Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.
| Repository | TP | FP | FN | Recall % | F2 |
|---|---|---|---|---|---|
| vulnpy | 48 | 5 | 30 | 61.1 | 65.0 |
| vampi | 9 | 3 | 6 | 60.0 | 62.1 |
| intentionally-vulnerable-python-application | 4 | 1 | 3 | 57.1 | 60.6 |
| vfapi | 6 | 4 | 4 | 61.1 | 59.9 |
| dsvw | 15 | 5 | 12 | 55.6 | 58.6 |
| vulnerable-api | 8 | 5 | 6 | 57.1 | 57.8 |
| dsvpwa | 16 | 2 | 16 | 51.6 | 55.7 |
| pythonssti | 1 | 0 | 1 | 50.0 | 55.6 |
| dvblab | 12 | 9 | 10 | 52.3 | 53.0 |
| python-app | 9 | 5 | 11 | 45.0 | 47.9 |
| lets-be-bad-guys | 10 | 4 | 14 | 43.8 | 47.3 |
| vulnerable-tornado-app | 6 | 3 | 8 | 42.9 | 46.2 |
| damn-vulnerable-flask-application | 6 | 4 | 8 | 43.3 | 45.4 |
| python-insecure-app | 3 | 0 | 5 | 37.5 | 42.3 |
| vulnerable-flask-app | 8 | 8 | 13 | 38.1 | 39.9 |
| pygoat | 27 | 13 | 50 | 35.1 | 38.7 |
| flask-xss | 10 | 3 | 20 | 34.4 | 38.6 |
| damn-vulnerable-graphql-application | 13 | 14 | 23 | 36.1 | 38.0 |
| vulpy | 19 | 3 | 38 | 33.3 | 38.0 |
| extremely-vulnerable-flask-app | 10 | 1 | 22 | 31.2 | 36.0 |
| threatbyte | 8 | 3 | 18 | 30.8 | 34.8 |
| dvpwa | 5 | 7 | 17 | 24.2 | 26.5 |
§
Detection by severity
| Severity | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Critical | 58 | 0 | 15 | 79.5 |
| High | 111 | 0 | 109 | 50.5 |
| Medium | 82 | 0 | 152 | 35.0 |
| Low | 3 | 0 | 58 | 4.9 |
§
Detection by vulnerability class
| CWE family | TP | FP | FN | Recall % |
|---|---|---|---|---|
| SQL Injection | 36 | 0 | 0 | 100.0 |
| XML External Entities | 8 | 0 | 0 | 100.0 |
| XPath Injection | 4 | 0 | 0 | 100.0 |
| Insecure Deserialization | 14 | 0 | 1 | 93.3 |
| Code Injection / RFI | 11 | 0 | 1 | 91.7 |
| Server-Side Request Forgery | 20 | 0 | 2 | 90.9 |
| Command / OS Injection | 14 | 0 | 2 | 87.5 |
| Denial of Service | 17 | 0 | 3 | 85.0 |
| Open Redirect | 4 | 0 | 1 | 80.0 |
| Broken Access Control / IDOR | 14 | 0 | 6 | 70.0 |
| Path Traversal | 16 | 0 | 7 | 69.6 |
| Hardcoded Credentials | 23 | 0 | 23 | 50.0 |
| HTTP Header Injection | 1 | 0 | 1 | 50.0 |
| Cross-Site Scripting | 24 | 0 | 46 | 34.3 |
| Missing Authentication / Authorization | 9 | 0 | 26 | 25.7 |
| Other | 30 | 0 | 147 | 16.9 |
| Security Misconfiguration | 4 | 0 | 25 | 13.8 |
| Sensitive Data Exposure | 5 | 0 | 43 | 10.4 |
§
LLM operational metrics
30,099
Avg input tokens
5,274
Avg output tokens
168,656
Avg total tokens
119s
Avg latency / repo
5.6%
JSON repair rate
72
Total runs
±10.7
F2 run-to-run σ
§
Cost
$1
Total cost
$0.02
Cost / run
$0.007
Cost / 100 LOC
14,785
Python LOC scanned
50
Successful runs