realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

DeepSeek V4 Flash by DeepSeek ↗

General-Purpose LLM · agentic-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).

56.5
F3 (strict)
58.1
F2 (strict)
54.9%
Recall (strict)
75.2%
Precision
26/26
Repos scored
deepseek-v4-flash
Model
$1
Total cost
150s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
pythonssti100 F2 · 100%
intentionally-vulnerable-python-application82 F2 · 81%
vfapi80 F2 · 85%
vulnpy78 F2 · 78%
vulnerable-api75 F2 · 74%
dsvw75 F2 · 72%
dsvpwa74 F2 · 74%
lets-be-bad-guys73 F2 · 69%
insecure-web72 F2 · 74%
dvblab71 F2 · 71%
vampi70 F2 · 73%
python-insecure-app69 F2 · 67%
python-app69 F2 · 68%
vulnerable-flask-app67 F2 · 65%
vulnerable-tornado-app66 F2 · 64%
damn-vulnerable-flask-application61 F2 · 60%
owasp-web-playground58 F2 · 56%
threatbyte55 F2 · 50%
dvpwa49 F2 · 45%
extremely-vulnerable-flask-app47 F2 · 43%
vulnerable-python-apps47 F2 · 45%
pygoat46 F2 · 42%
damn-vulnerable-graphql-application44 F2 · 40%
vulpy40 F2 · 36%
flask-xss36 F2 · 31%
djangoat29 F2 · 25%
RepositoryTPFPFNRecall %F2
pythonssti200100.0100.0
intentionally-vulnerable-python-application61181.081.7
vfapi84185.280.5
vulnpy61171778.278.0
vulnerable-api102473.875.2
dsvw192871.674.6
dsvpwa249874.073.8
lets-be-bad-guys172769.572.7
insecure-web74274.171.9
dvblab166671.271.4
vampi117473.370.5
python-insecure-app51366.769.3
python-app145668.368.9
vulnerable-flask-app144765.167.0
vulnerable-tornado-app94564.365.5
damn-vulnerable-flask-application95660.060.7
owasp-web-playground1681256.057.6
threatbyte1321350.054.8
dvpwa1031245.549.2
extremely-vulnerable-flask-app1421842.747.4
vulnerable-python-apps1081245.547.1
pygoat32114542.046.0
damn-vulnerable-graphql-application1462240.344.0
vulpy2033735.740.4
flask-xss922131.135.6
djangoat1383725.328.6
§

Detection by severity

SeverityTPFPFNRecall %
Critical770989.5
High147111755.7
Medium128015145.9
Low2804041.2
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
XML External Entities800100.0
Insecure Deserialization1900100.0
HTTP Header Injection200100.0
XPath Injection400100.0
SQL Injection430491.5
Path Traversal230388.5
Command / OS Injection150288.2
Code Injection / RFI120285.7
Open Redirect50183.3
Broken Access Control / IDOR190579.2
Cross-Site Scripting5602668.3
Server-Side Request Forgery151962.5
Hardcoded Credentials3402755.7
Security Misconfiguration1501845.5
Sensitive Data Exposure2203538.6
Other71013534.5
Missing Authentication / Authorization1503231.9
Denial of Service201810.0
§

LLM operational metrics

22,794
Avg input tokens
11,105
Avg output tokens
197,697
Avg total tokens
150s
Avg latency / repo
1.3%
JSON repair rate
78
Total runs
±16.6
F2 run-to-run σ
§

Cost

$1
Total cost
$0.01
Cost / run
$0.005
Cost / 100 LOC
20,062
Python LOC scanned
74
Successful runs

← Back to the leaderboard