realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

DeepSeek V4 Pro by DeepSeek ↗

General-Purpose LLM · agentic-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).

52.9
F3 (strict)
55.0
F2 (strict)
51.1%
Recall (strict)
78.9%
Precision
26/26
Repos scored
deepseek-v4-pro
Model
$10
Total cost
399s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
python-app80 F2 · 82%
dsvw78 F2 · 75%
vfapi75 F2 · 74%
insecure-web72 F2 · 70%
vampi71 F2 · 70%
dsvpwa71 F2 · 69%
vulnerable-api70 F2 · 67%
dvblab67 F2 · 64%
pythonssti67 F2 · 67%
vulnpy65 F2 · 62%
intentionally-vulnerable-python-application62 F2 · 62%
damn-vulnerable-flask-application61 F2 · 58%
owasp-web-playground56 F2 · 52%
vulnerable-flask-app55 F2 · 52%
python-insecure-app54 F2 · 50%
extremely-vulnerable-flask-app51 F2 · 46%
vulnerable-tornado-app50 F2 · 46%
lets-be-bad-guys50 F2 · 46%
threatbyte49 F2 · 44%
pygoat48 F2 · 45%
dvpwa46 F2 · 42%
vulpy45 F2 · 40%
flask-xss44 F2 · 39%
vulnerable-python-apps42 F2 · 38%
damn-vulnerable-graphql-application42 F2 · 38%
djangoat34 F2 · 30%
RepositoryTPFPFNRecall %F2
python-app165481.780.2
dsvw203775.377.6
vfapi72274.174.6
insecure-web62370.471.9
vampi103470.071.4
dsvpwa2261068.870.6
vulnerable-api91566.770.0
dvblab142863.667.3
pythonssti11166.766.7
vulnpy4883061.565.3
intentionally-vulnerable-python-application42361.961.9
damn-vulnerable-flask-application93657.860.6
owasp-web-playground1541352.456.0
vulnerable-flask-app1151052.455.1
python-insecure-app41450.053.8
extremely-vulnerable-flask-app1511745.850.9
vulnerable-tornado-app62846.450.0
lets-be-bad-guys1141345.849.7
threatbyte1211444.249.3
pygoat34134244.848.5
dvpwa941342.445.8
vulpy2363440.444.7
flask-xss1211838.944.0
vulnerable-python-apps821437.942.3
damn-vulnerable-graphql-application1442238.042.2
djangoat1593530.033.5
§

Detection by severity

SeverityTPFPFNRecall %
Critical7311384.9
High144112054.5
Medium117116241.9
Low2404435.3
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
HTTP Header Injection200100.0
XPath Injection400100.0
SQL Injection430491.5
Insecure Deserialization170289.5
Command / OS Injection150288.2
XML External Entities71187.5
Open Redirect50183.3
Broken Access Control / IDOR190579.2
Code Injection / RFI110378.6
Path Traversal190773.1
Hardcoded Credentials4112067.2
Server-Side Request Forgery1201250.0
Cross-Site Scripting3904347.6
Sensitive Data Exposure2103636.8
Security Misconfiguration1202136.4
Other74113235.9
Missing Authentication / Authorization1503231.9
Denial of Service201810.0
§

LLM operational metrics

21,690
Avg input tokens
11,883
Avg output tokens
185,498
Avg total tokens
399s
Avg latency / repo
0.0%
JSON repair rate
78
Total runs
±12.7
F2 run-to-run σ
§

Cost

$10
Total cost
$0.14
Cost / run
$0.048
Cost / 100 LOC
20,062
Python LOC scanned
71
Successful runs

← Back to the leaderboard