realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Claude Opus 4.7 by Anthropic ↗

General-Purpose LLM · agentic-v1 · scored on 25/26 repositories. Strict scoring (unfinished repos counted as misses).

47.5
F3 (strict)
49.4
F2 (strict)
45.8%
Recall (strict)
71.5%
Precision
25/26
Repos scored
claude-opus-4-7
Model
$32
Total cost
76s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
pythonssti100 F2 · 100%
vfapi84 F2 · 89%
intentionally-vulnerable-python-application82 F2 · 81%
vulnerable-tornado-app76 F2 · 74%
dsvw73 F2 · 70%
insecure-web73 F2 · 74%
python-app72 F2 · 72%
vulnerable-api72 F2 · 69%
vulnerable-python-apps70 F2 · 70%
vampi69 F2 · 67%
dvblab65 F2 · 62%
dsvpwa64 F2 · 62%
damn-vulnerable-flask-application62 F2 · 62%
python-insecure-app58 F2 · 54%
owasp-web-playground56 F2 · 55%
threatbyte55 F2 · 51%
pygoat54 F2 · 53%
vulnerable-flask-app54 F2 · 51%
lets-be-bad-guys52 F2 · 49%
extremely-vulnerable-flask-app48 F2 · 44%
djangoat42 F2 · 39%
damn-vulnerable-graphql-application40 F2 · 36%
dvpwa39 F2 · 35%
vulpy34 F2 · 30%
flask-xss25 F2 · 23%
RepositoryTPFPFNRecall %F2
pythonssti200100.0100.0
vfapi84188.983.9
intentionally-vulnerable-python-application61181.081.6
vulnerable-tornado-app102473.875.7
dsvw193870.473.3
insecure-web73274.173.0
python-app146672.572.5
vulnerable-api102469.071.9
vulnerable-python-apps155769.770.2
vampi102566.769.1
dvblab143862.165.0
dsvpwa2081262.563.9
damn-vulnerable-flask-application95662.262.5
python-insecure-app41454.258.5
owasp-web-playground16101255.456.2
threatbyte1331351.355.2
pygoat40233652.654.3
vulnerable-flask-app1151050.853.5
lets-be-bad-guys1241248.652.1
extremely-vulnerable-flask-app1441843.847.9
djangoat20133039.342.3
damn-vulnerable-graphql-application1362336.139.8
dvpwa821434.839.4
vulpy1794029.833.5
flask-xss732323.325.4
§

Detection by severity

SeverityTPFPFNRecall %
Critical690790.8
High134110356.5
Medium109013944.0
Low1704129.3
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
Code Injection / RFI1100100.0
HTTP Header Injection200100.0
XPath Injection100100.0
SQL Injection430197.7
Insecure Deserialization140193.3
Command / OS Injection130192.9
Path Traversal150383.3
Open Redirect50183.3
Server-Side Request Forgery90281.8
XML External Entities41180.0
Broken Access Control / IDOR180675.0
Hardcoded Credentials4301870.5
Security Misconfiguration1801358.1
Missing Authentication / Authorization2102644.7
Cross-Site Scripting2604437.1
Other69012934.8
Sensitive Data Exposure1604128.1
Denial of Service10325.0
§

LLM operational metrics

14
Avg input tokens
5,440
Avg output tokens
287,918
Avg total tokens
76s
Avg latency / repo
0.0%
JSON repair rate
78
Total runs
±17.2
F2 run-to-run σ
§

Cost

$32
Total cost
$0.49
Cost / run
$0.184
Cost / 100 LOC
17,572
Python LOC scanned
66
Successful runs

← Back to the leaderboard