realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Claude Haiku 4.5 by Anthropic ↗

General-Purpose LLM · agentic-v1 · scored on 24/26 repositories. Strict scoring (unfinished repos counted as misses).

36.4
F3 (strict)
38.6
F2 (strict)
34.4%
Recall (strict)
75.2%
Precision
24/26
Repos scored
claude-haiku-4-5-20251001
Model
$5
Total cost
56s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
insecure-web71 F2 · 67%
intentionally-vulnerable-python-application64 F2 · 62%
vfapi58 F2 · 56%
lets-be-bad-guys57 F2 · 53%
damn-vulnerable-flask-application55 F2 · 53%
dvblab55 F2 · 53%
dsvw55 F2 · 49%
vulnerable-tornado-app54 F2 · 50%
pythonssti52 F2 · 50%
python-app52 F2 · 48%
vampi51 F2 · 49%
python-insecure-app51 F2 · 46%
vulnpy50 F2 · 47%
vulnerable-api50 F2 · 45%
vulnerable-flask-app50 F2 · 46%
dsvpwa43 F2 · 38%
dvpwa36 F2 · 32%
threatbyte33 F2 · 29%
pygoat33 F2 · 29%
extremely-vulnerable-flask-app32 F2 · 28%
flask-xss31 F2 · 27%
damn-vulnerable-graphql-application28 F2 · 26%
djangoat28 F2 · 25%
vulpy21 F2 · 18%
RepositoryTPFPFNRecall %F2
insecure-web60366.770.9
intentionally-vulnerable-python-application41361.964.2
vfapi52455.657.7
lets-be-bad-guys1321152.857.2
damn-vulnerable-flask-application84753.355.3
dvblab1261053.055.3
dsvw1301449.454.8
vulnerable-tornado-app72750.053.6
pythonssti11150.051.9
python-app1041048.351.7
vampi74848.951.3
python-insecure-app41445.850.6
vulnpy3774147.050.1
vulnerable-api61845.249.9
vulnerable-flask-app1031146.049.8
dsvpwa1212037.542.6
dvpwa711531.836.3
threatbyte851829.533.0
pygoat22105529.032.8
extremely-vulnerable-flask-app932328.132.2
flask-xss822226.730.7
damn-vulnerable-graphql-application9112725.928.2
djangoat1263824.728.2
vulpy1024717.520.8
§

Detection by severity

SeverityTPFPFNRecall %
Critical6202075.6
High104113942.8
Medium68319226.2
Low1504724.2
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
XML External Entities800100.0
XPath Injection400100.0
SQL Injection380197.4
Insecure Deserialization150193.8
Open Redirect50183.3
Path Traversal201580.0
Code Injection / RFI110378.6
Command / OS Injection130476.5
Server-Side Request Forgery1201054.5
Broken Access Control / IDOR1101150.0
HTTP Header Injection10150.0
Hardcoded Credentials2013337.7
Cross-Site Scripting2625332.9
Security Misconfiguration802425.0
Other42015121.8
Denial of Service401620.0
Missing Authentication / Authorization703616.3
Sensitive Data Exposure40487.7
§

LLM operational metrics

36
Avg input tokens
4,888
Avg output tokens
243,089
Avg total tokens
56s
Avg latency / repo
0.0%
JSON repair rate
72
Total runs
±12.9
F2 run-to-run σ
§

Cost

$5
Total cost
$0.07
Cost / run
$0.026
Cost / 100 LOC
20,062
Python LOC scanned
72
Successful runs

← Back to the leaderboard