realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Claude Opus 4.8 by Anthropic ↗

General-Purpose LLM · agentic-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).

53.6
F3 (strict)
55.7
F2 (strict)
51.6%
Recall (strict)
80.7%
Precision
26/26
Repos scored
claude-opus-4-8
Model
$36
Total cost
80s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vfapi83 F2 · 85%
insecure-web77 F2 · 78%
vulnerable-api75 F2 · 71%
dsvw75 F2 · 72%
intentionally-vulnerable-python-application72 F2 · 71%
vulnerable-tornado-app72 F2 · 69%
dvblab70 F2 · 67%
dsvpwa70 F2 · 67%
damn-vulnerable-flask-application68 F2 · 64%
pythonssti67 F2 · 67%
vampi65 F2 · 62%
vulnerable-python-apps65 F2 · 61%
vulnpy64 F2 · 60%
lets-be-bad-guys59 F2 · 56%
dvpwa59 F2 · 55%
python-insecure-app58 F2 · 54%
vulnerable-flask-app58 F2 · 56%
owasp-web-playground58 F2 · 56%
threatbyte54 F2 · 50%
flask-xss50 F2 · 46%
pygoat49 F2 · 45%
extremely-vulnerable-flask-app48 F2 · 43%
python-app48 F2 · 47%
djangoat37 F2 · 33%
damn-vulnerable-graphql-application36 F2 · 32%
vulpy27 F2 · 23%
RepositoryTPFPFNRecall %F2
vfapi82185.283.4
insecure-web73277.876.7
vulnerable-api100471.475.3
dsvw192871.674.9
intentionally-vulnerable-python-application52271.472.1
vulnerable-tornado-app102469.071.8
dvblab152766.770.1
dsvpwa2131166.770.0
damn-vulnerable-flask-application102564.467.8
pythonssti10166.767.4
vampi93662.264.8
vulnerable-python-apps143861.464.6
vulnpy4763159.863.9
lets-be-bad-guys1331155.559.3
dvpwa1221054.558.6
python-insecure-app41454.258.5
vulnerable-flask-app124955.558.5
owasp-web-playground1671256.058.0
threatbyte1331350.054.1
flask-xss1421645.650.1
pygoat3594245.049.3
extremely-vulnerable-flask-app1411842.748.0
python-app981146.747.6
djangoat1773333.337.2
damn-vulnerable-graphql-application1262432.436.0
vulpy1334422.826.6
§

Detection by severity

SeverityTPFPFNRecall %
Critical7611088.4
High135212951.1
Medium112016740.1
Low2104730.9
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
Code Injection / RFI1400100.0
Open Redirect600100.0
HTTP Header Injection200100.0
XPath Injection400100.0
Insecure Deserialization170289.5
SQL Injection420589.4
XML External Entities71187.5
Command / OS Injection140382.4
Path Traversal191773.1
Broken Access Control / IDOR160866.7
Hardcoded Credentials3602559.0
Server-Side Request Forgery1301154.2
Cross-Site Scripting4403853.7
Security Misconfiguration1501845.5
Missing Authentication / Authorization1603134.0
Other63114330.6
Sensitive Data Exposure1404324.6
Denial of Service201810.0
§

LLM operational metrics

15
Avg input tokens
6,837
Avg output tokens
257,687
Avg total tokens
80s
Avg latency / repo
0.0%
JSON repair rate
77
Total runs
±13.7
F2 run-to-run σ
§

Cost

$36
Total cost
$0.46
Cost / run
$0.178
Cost / 100 LOC
20,062
Python LOC scanned
77
Successful runs

← Back to the leaderboard