Scanner deep-dive
Claude Haiku 4.5 by Anthropic ↗
General-Purpose LLM · agentic-v1 · scored on 24/26 repositories. Strict scoring (unfinished repos counted as misses).
36.4
F3 (strict)
38.6
F2 (strict)
34.4%
Recall (strict)
75.2%
Precision
24/26
Repos scored
claude-haiku-4-5-20251001
Model
$5
Total cost
56s
Avg latency
§
Per-repository breakdown
Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.
| Repository | TP | FP | FN | Recall % | F2 |
|---|---|---|---|---|---|
| insecure-web | 6 | 0 | 3 | 66.7 | 70.9 |
| intentionally-vulnerable-python-application | 4 | 1 | 3 | 61.9 | 64.2 |
| vfapi | 5 | 2 | 4 | 55.6 | 57.7 |
| lets-be-bad-guys | 13 | 2 | 11 | 52.8 | 57.2 |
| damn-vulnerable-flask-application | 8 | 4 | 7 | 53.3 | 55.3 |
| dvblab | 12 | 6 | 10 | 53.0 | 55.3 |
| dsvw | 13 | 0 | 14 | 49.4 | 54.8 |
| vulnerable-tornado-app | 7 | 2 | 7 | 50.0 | 53.6 |
| pythonssti | 1 | 1 | 1 | 50.0 | 51.9 |
| python-app | 10 | 4 | 10 | 48.3 | 51.7 |
| vampi | 7 | 4 | 8 | 48.9 | 51.3 |
| python-insecure-app | 4 | 1 | 4 | 45.8 | 50.6 |
| vulnpy | 37 | 7 | 41 | 47.0 | 50.1 |
| vulnerable-api | 6 | 1 | 8 | 45.2 | 49.9 |
| vulnerable-flask-app | 10 | 3 | 11 | 46.0 | 49.8 |
| dsvpwa | 12 | 1 | 20 | 37.5 | 42.6 |
| dvpwa | 7 | 1 | 15 | 31.8 | 36.3 |
| threatbyte | 8 | 5 | 18 | 29.5 | 33.0 |
| pygoat | 22 | 10 | 55 | 29.0 | 32.8 |
| extremely-vulnerable-flask-app | 9 | 3 | 23 | 28.1 | 32.2 |
| flask-xss | 8 | 2 | 22 | 26.7 | 30.7 |
| damn-vulnerable-graphql-application | 9 | 11 | 27 | 25.9 | 28.2 |
| djangoat | 12 | 6 | 38 | 24.7 | 28.2 |
| vulpy | 10 | 2 | 47 | 17.5 | 20.8 |
§
Detection by severity
| Severity | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Critical | 62 | 0 | 20 | 75.6 |
| High | 104 | 1 | 139 | 42.8 |
| Medium | 68 | 3 | 192 | 26.2 |
| Low | 15 | 0 | 47 | 24.2 |
§
Detection by vulnerability class
| CWE family | TP | FP | FN | Recall % |
|---|---|---|---|---|
| XML External Entities | 8 | 0 | 0 | 100.0 |
| XPath Injection | 4 | 0 | 0 | 100.0 |
| SQL Injection | 38 | 0 | 1 | 97.4 |
| Insecure Deserialization | 15 | 0 | 1 | 93.8 |
| Open Redirect | 5 | 0 | 1 | 83.3 |
| Path Traversal | 20 | 1 | 5 | 80.0 |
| Code Injection / RFI | 11 | 0 | 3 | 78.6 |
| Command / OS Injection | 13 | 0 | 4 | 76.5 |
| Server-Side Request Forgery | 12 | 0 | 10 | 54.5 |
| Broken Access Control / IDOR | 11 | 0 | 11 | 50.0 |
| HTTP Header Injection | 1 | 0 | 1 | 50.0 |
| Hardcoded Credentials | 20 | 1 | 33 | 37.7 |
| Cross-Site Scripting | 26 | 2 | 53 | 32.9 |
| Security Misconfiguration | 8 | 0 | 24 | 25.0 |
| Other | 42 | 0 | 151 | 21.8 |
| Denial of Service | 4 | 0 | 16 | 20.0 |
| Missing Authentication / Authorization | 7 | 0 | 36 | 16.3 |
| Sensitive Data Exposure | 4 | 0 | 48 | 7.7 |
§
LLM operational metrics
36
Avg input tokens
4,888
Avg output tokens
243,089
Avg total tokens
56s
Avg latency / repo
0.0%
JSON repair rate
72
Total runs
±12.9
F2 run-to-run σ
§
Cost
$5
Total cost
$0.07
Cost / run
$0.026
Cost / 100 LOC
20,062
Python LOC scanned
72
Successful runs