Scanner deep-dive
Claude Haiku 4.5 by Anthropic ↗
General-Purpose LLM · direct-v1 · scored on 23/26 repositories. Strict scoring (unfinished repos counted as misses).
25.4
F3 (strict)
26.8
F2 (strict)
24.1%
Recall (strict)
48.7%
Precision
23/26
Repos scored
claude-haiku-4-5-20251001
Model
$5
Total cost
19s
Avg latency
§
Per-repository breakdown
Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.
| Repository | TP | FP | FN | Recall % | F2 |
|---|---|---|---|---|---|
| intentionally-vulnerable-python-application | 4 | 1 | 3 | 61.9 | 65.7 |
| insecure-web | 6 | 3 | 3 | 63.0 | 63.4 |
| vulnerable-api | 8 | 2 | 6 | 59.5 | 62.8 |
| python-insecure-app | 4 | 0 | 4 | 54.2 | 59.0 |
| pythonssti | 1 | 1 | 1 | 50.0 | 51.9 |
| vulnpy | 35 | 7 | 43 | 44.4 | 49.0 |
| damn-vulnerable-flask-application | 7 | 5 | 8 | 46.7 | 48.2 |
| flask-xss | 12 | 4 | 18 | 41.1 | 45.1 |
| vampi | 6 | 7 | 9 | 42.2 | 43.2 |
| dvblab | 8 | 10 | 14 | 36.4 | 37.6 |
| lets-be-bad-guys | 7 | 10 | 17 | 30.6 | 32.5 |
| vulnerable-flask-app | 6 | 10 | 15 | 30.2 | 31.6 |
| vulnerable-tornado-app | 4 | 4 | 10 | 28.6 | 31.1 |
| dsvpwa | 8 | 8 | 24 | 26.0 | 29.1 |
| vulpy | 14 | 10 | 43 | 24.0 | 27.1 |
| dsvw | 7 | 10 | 20 | 24.7 | 26.8 |
| extremely-vulnerable-flask-app | 7 | 5 | 25 | 22.9 | 26.1 |
| threatbyte | 6 | 13 | 20 | 21.8 | 23.1 |
| dvpwa | 4 | 8 | 18 | 19.7 | 21.7 |
| vfapi | 2 | 15 | 7 | 18.5 | 16.1 |
| damn-vulnerable-graphql-application | 3 | 15 | 33 | 8.3 | 9.2 |
| djangoat | 4 | 14 | 46 | 8.0 | 9.1 |
| pygoat | 5 | 15 | 72 | 6.1 | 7.1 |
§
Detection by severity
| Severity | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Critical | 28 | 1 | 50 | 35.9 |
| High | 63 | 2 | 172 | 26.8 |
| Medium | 56 | 0 | 197 | 22.1 |
| Low | 6 | 0 | 55 | 9.8 |
§
Detection by vulnerability class
| CWE family | TP | FP | FN | Recall % |
|---|---|---|---|---|
| HTTP Header Injection | 2 | 0 | 0 | 100.0 |
| XPath Injection | 3 | 0 | 1 | 75.0 |
| SQL Injection | 20 | 2 | 18 | 52.6 |
| Path Traversal | 12 | 0 | 11 | 52.2 |
| XML External Entities | 3 | 0 | 4 | 42.9 |
| Cross-Site Scripting | 31 | 0 | 46 | 40.3 |
| Insecure Deserialization | 5 | 0 | 9 | 35.7 |
| Hardcoded Credentials | 18 | 0 | 33 | 35.3 |
| Command / OS Injection | 5 | 0 | 11 | 31.2 |
| Code Injection / RFI | 4 | 0 | 10 | 28.6 |
| Broken Access Control / IDOR | 5 | 0 | 17 | 22.7 |
| Other | 32 | 1 | 155 | 17.1 |
| Open Redirect | 1 | 0 | 5 | 16.7 |
| Security Misconfiguration | 5 | 0 | 27 | 15.6 |
| Server-Side Request Forgery | 2 | 0 | 20 | 9.1 |
| Missing Authentication / Authorization | 3 | 0 | 39 | 7.1 |
| Sensitive Data Exposure | 2 | 0 | 48 | 4.0 |
| Denial of Service | 0 | 0 | 20 | 0.0 |
§
LLM operational metrics
54,965
Avg input tokens
3,312
Avg output tokens
58,278
Avg total tokens
19s
Avg latency / repo
0.0%
JSON repair rate
69
Total runs
±17.8
F2 run-to-run σ
§
Cost
$5
Total cost
$0.07
Cost / run
$0.025
Cost / 100 LOC
19,723
Python LOC scanned
69
Successful runs