Scanner deep-dive
DeepSeek V4 Flash by DeepSeek ↗
General-Purpose LLM · agentic-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).
56.5
F3 (strict)
58.1
F2 (strict)
54.9%
Recall (strict)
75.2%
Precision
26/26
Repos scored
deepseek-v4-flash
Model
$1
Total cost
150s
Avg latency
§
Per-repository breakdown
Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.
| Repository | TP | FP | FN | Recall % | F2 |
|---|---|---|---|---|---|
| pythonssti | 2 | 0 | 0 | 100.0 | 100.0 |
| intentionally-vulnerable-python-application | 6 | 1 | 1 | 81.0 | 81.7 |
| vfapi | 8 | 4 | 1 | 85.2 | 80.5 |
| vulnpy | 61 | 17 | 17 | 78.2 | 78.0 |
| vulnerable-api | 10 | 2 | 4 | 73.8 | 75.2 |
| dsvw | 19 | 2 | 8 | 71.6 | 74.6 |
| dsvpwa | 24 | 9 | 8 | 74.0 | 73.8 |
| lets-be-bad-guys | 17 | 2 | 7 | 69.5 | 72.7 |
| insecure-web | 7 | 4 | 2 | 74.1 | 71.9 |
| dvblab | 16 | 6 | 6 | 71.2 | 71.4 |
| vampi | 11 | 7 | 4 | 73.3 | 70.5 |
| python-insecure-app | 5 | 1 | 3 | 66.7 | 69.3 |
| python-app | 14 | 5 | 6 | 68.3 | 68.9 |
| vulnerable-flask-app | 14 | 4 | 7 | 65.1 | 67.0 |
| vulnerable-tornado-app | 9 | 4 | 5 | 64.3 | 65.5 |
| damn-vulnerable-flask-application | 9 | 5 | 6 | 60.0 | 60.7 |
| owasp-web-playground | 16 | 8 | 12 | 56.0 | 57.6 |
| threatbyte | 13 | 2 | 13 | 50.0 | 54.8 |
| dvpwa | 10 | 3 | 12 | 45.5 | 49.2 |
| extremely-vulnerable-flask-app | 14 | 2 | 18 | 42.7 | 47.4 |
| vulnerable-python-apps | 10 | 8 | 12 | 45.5 | 47.1 |
| pygoat | 32 | 11 | 45 | 42.0 | 46.0 |
| damn-vulnerable-graphql-application | 14 | 6 | 22 | 40.3 | 44.0 |
| vulpy | 20 | 3 | 37 | 35.7 | 40.4 |
| flask-xss | 9 | 2 | 21 | 31.1 | 35.6 |
| djangoat | 13 | 8 | 37 | 25.3 | 28.6 |
§
Detection by severity
| Severity | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Critical | 77 | 0 | 9 | 89.5 |
| High | 147 | 1 | 117 | 55.7 |
| Medium | 128 | 0 | 151 | 45.9 |
| Low | 28 | 0 | 40 | 41.2 |
§
Detection by vulnerability class
| CWE family | TP | FP | FN | Recall % |
|---|---|---|---|---|
| XML External Entities | 8 | 0 | 0 | 100.0 |
| Insecure Deserialization | 19 | 0 | 0 | 100.0 |
| HTTP Header Injection | 2 | 0 | 0 | 100.0 |
| XPath Injection | 4 | 0 | 0 | 100.0 |
| SQL Injection | 43 | 0 | 4 | 91.5 |
| Path Traversal | 23 | 0 | 3 | 88.5 |
| Command / OS Injection | 15 | 0 | 2 | 88.2 |
| Code Injection / RFI | 12 | 0 | 2 | 85.7 |
| Open Redirect | 5 | 0 | 1 | 83.3 |
| Broken Access Control / IDOR | 19 | 0 | 5 | 79.2 |
| Cross-Site Scripting | 56 | 0 | 26 | 68.3 |
| Server-Side Request Forgery | 15 | 1 | 9 | 62.5 |
| Hardcoded Credentials | 34 | 0 | 27 | 55.7 |
| Security Misconfiguration | 15 | 0 | 18 | 45.5 |
| Sensitive Data Exposure | 22 | 0 | 35 | 38.6 |
| Other | 71 | 0 | 135 | 34.5 |
| Missing Authentication / Authorization | 15 | 0 | 32 | 31.9 |
| Denial of Service | 2 | 0 | 18 | 10.0 |
§
LLM operational metrics
22,794
Avg input tokens
11,105
Avg output tokens
197,697
Avg total tokens
150s
Avg latency / repo
1.3%
JSON repair rate
78
Total runs
±16.6
F2 run-to-run σ
§
Cost
$1
Total cost
$0.01
Cost / run
$0.005
Cost / 100 LOC
20,062
Python LOC scanned
74
Successful runs