realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Claude Haiku 4.5 by Anthropic ↗

General-Purpose LLM · direct-v1 · scored on 23/26 repositories. Strict scoring (unfinished repos counted as misses).

25.4
F3 (strict)
26.8
F2 (strict)
24.1%
Recall (strict)
48.7%
Precision
23/26
Repos scored
claude-haiku-4-5-20251001
Model
$5
Total cost
19s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
intentionally-vulnerable-python-application66 F2 · 62%
insecure-web63 F2 · 63%
vulnerable-api63 F2 · 60%
python-insecure-app59 F2 · 54%
pythonssti52 F2 · 50%
vulnpy49 F2 · 44%
damn-vulnerable-flask-application48 F2 · 47%
flask-xss45 F2 · 41%
vampi43 F2 · 42%
dvblab38 F2 · 36%
lets-be-bad-guys32 F2 · 31%
vulnerable-flask-app32 F2 · 30%
vulnerable-tornado-app31 F2 · 29%
dsvpwa29 F2 · 26%
vulpy27 F2 · 24%
dsvw27 F2 · 25%
extremely-vulnerable-flask-app26 F2 · 23%
threatbyte23 F2 · 22%
dvpwa22 F2 · 20%
vfapi16 F2 · 19%
damn-vulnerable-graphql-application9 F2 · 8%
djangoat9 F2 · 8%
pygoat7 F2 · 6%
RepositoryTPFPFNRecall %F2
intentionally-vulnerable-python-application41361.965.7
insecure-web63363.063.4
vulnerable-api82659.562.8
python-insecure-app40454.259.0
pythonssti11150.051.9
vulnpy3574344.449.0
damn-vulnerable-flask-application75846.748.2
flask-xss1241841.145.1
vampi67942.243.2
dvblab8101436.437.6
lets-be-bad-guys7101730.632.5
vulnerable-flask-app6101530.231.6
vulnerable-tornado-app441028.631.1
dsvpwa882426.029.1
vulpy14104324.027.1
dsvw7102024.726.8
extremely-vulnerable-flask-app752522.926.1
threatbyte6132021.823.1
dvpwa481819.721.7
vfapi215718.516.1
damn-vulnerable-graphql-application315338.39.2
djangoat414468.09.1
pygoat515726.17.1
§

Detection by severity

SeverityTPFPFNRecall %
Critical2815035.9
High63217226.8
Medium56019722.1
Low60559.8
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
HTTP Header Injection200100.0
XPath Injection30175.0
SQL Injection2021852.6
Path Traversal1201152.2
XML External Entities30442.9
Cross-Site Scripting3104640.3
Insecure Deserialization50935.7
Hardcoded Credentials1803335.3
Command / OS Injection501131.2
Code Injection / RFI401028.6
Broken Access Control / IDOR501722.7
Other32115517.1
Open Redirect10516.7
Security Misconfiguration502715.6
Server-Side Request Forgery20209.1
Missing Authentication / Authorization30397.1
Sensitive Data Exposure20484.0
Denial of Service00200.0
§

LLM operational metrics

54,965
Avg input tokens
3,312
Avg output tokens
58,278
Avg total tokens
19s
Avg latency / repo
0.0%
JSON repair rate
69
Total runs
±17.8
F2 run-to-run σ
§

Cost

$5
Total cost
$0.07
Cost / run
$0.025
Cost / 100 LOC
19,723
Python LOC scanned
69
Successful runs

← Back to the leaderboard