Scanner deep-dive
DeepSeek V4 Pro by DeepSeek ↗
General-Purpose LLM · agentic-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).
52.9
F3 (strict)
55.0
F2 (strict)
51.1%
Recall (strict)
78.9%
Precision
26/26
Repos scored
deepseek-v4-pro
Model
$10
Total cost
399s
Avg latency
§
Per-repository breakdown
Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.
| Repository | TP | FP | FN | Recall % | F2 |
|---|---|---|---|---|---|
| python-app | 16 | 5 | 4 | 81.7 | 80.2 |
| dsvw | 20 | 3 | 7 | 75.3 | 77.6 |
| vfapi | 7 | 2 | 2 | 74.1 | 74.6 |
| insecure-web | 6 | 2 | 3 | 70.4 | 71.9 |
| vampi | 10 | 3 | 4 | 70.0 | 71.4 |
| dsvpwa | 22 | 6 | 10 | 68.8 | 70.6 |
| vulnerable-api | 9 | 1 | 5 | 66.7 | 70.0 |
| dvblab | 14 | 2 | 8 | 63.6 | 67.3 |
| pythonssti | 1 | 1 | 1 | 66.7 | 66.7 |
| vulnpy | 48 | 8 | 30 | 61.5 | 65.3 |
| intentionally-vulnerable-python-application | 4 | 2 | 3 | 61.9 | 61.9 |
| damn-vulnerable-flask-application | 9 | 3 | 6 | 57.8 | 60.6 |
| owasp-web-playground | 15 | 4 | 13 | 52.4 | 56.0 |
| vulnerable-flask-app | 11 | 5 | 10 | 52.4 | 55.1 |
| python-insecure-app | 4 | 1 | 4 | 50.0 | 53.8 |
| extremely-vulnerable-flask-app | 15 | 1 | 17 | 45.8 | 50.9 |
| vulnerable-tornado-app | 6 | 2 | 8 | 46.4 | 50.0 |
| lets-be-bad-guys | 11 | 4 | 13 | 45.8 | 49.7 |
| threatbyte | 12 | 1 | 14 | 44.2 | 49.3 |
| pygoat | 34 | 13 | 42 | 44.8 | 48.5 |
| dvpwa | 9 | 4 | 13 | 42.4 | 45.8 |
| vulpy | 23 | 6 | 34 | 40.4 | 44.7 |
| flask-xss | 12 | 1 | 18 | 38.9 | 44.0 |
| vulnerable-python-apps | 8 | 2 | 14 | 37.9 | 42.3 |
| damn-vulnerable-graphql-application | 14 | 4 | 22 | 38.0 | 42.2 |
| djangoat | 15 | 9 | 35 | 30.0 | 33.5 |
§
Detection by severity
| Severity | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Critical | 73 | 1 | 13 | 84.9 |
| High | 144 | 1 | 120 | 54.5 |
| Medium | 117 | 1 | 162 | 41.9 |
| Low | 24 | 0 | 44 | 35.3 |
§
Detection by vulnerability class
| CWE family | TP | FP | FN | Recall % |
|---|---|---|---|---|
| HTTP Header Injection | 2 | 0 | 0 | 100.0 |
| XPath Injection | 4 | 0 | 0 | 100.0 |
| SQL Injection | 43 | 0 | 4 | 91.5 |
| Insecure Deserialization | 17 | 0 | 2 | 89.5 |
| Command / OS Injection | 15 | 0 | 2 | 88.2 |
| XML External Entities | 7 | 1 | 1 | 87.5 |
| Open Redirect | 5 | 0 | 1 | 83.3 |
| Broken Access Control / IDOR | 19 | 0 | 5 | 79.2 |
| Code Injection / RFI | 11 | 0 | 3 | 78.6 |
| Path Traversal | 19 | 0 | 7 | 73.1 |
| Hardcoded Credentials | 41 | 1 | 20 | 67.2 |
| Server-Side Request Forgery | 12 | 0 | 12 | 50.0 |
| Cross-Site Scripting | 39 | 0 | 43 | 47.6 |
| Sensitive Data Exposure | 21 | 0 | 36 | 36.8 |
| Security Misconfiguration | 12 | 0 | 21 | 36.4 |
| Other | 74 | 1 | 132 | 35.9 |
| Missing Authentication / Authorization | 15 | 0 | 32 | 31.9 |
| Denial of Service | 2 | 0 | 18 | 10.0 |
§
LLM operational metrics
21,690
Avg input tokens
11,883
Avg output tokens
185,498
Avg total tokens
399s
Avg latency / repo
0.0%
JSON repair rate
78
Total runs
±12.7
F2 run-to-run σ
§
Cost
$10
Total cost
$0.14
Cost / run
$0.048
Cost / 100 LOC
20,062
Python LOC scanned
71
Successful runs