realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Gemini 3.5 Flash by Google DeepMind ↗

General-Purpose LLM · agentic-v1 · scored on 26/26 repositories. Strict scoring (unfinished repos counted as misses).

47.6
F3 (strict)
50.0
F2 (strict)
45.4%
Recall (strict)
84.0%
Precision
26/26
Repos scored
gemini-3.5-flash
Model
$28
Total cost
121s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vulnpy94 F2 · 99%
intentionally-vulnerable-python-application76 F2 · 71%
dsvpwa69 F2 · 66%
dsvw69 F2 · 64%
vampi62 F2 · 58%
dvblab60 F2 · 56%
insecure-web60 F2 · 56%
pythonssti56 F2 · 50%
python-app54 F2 · 48%
lets-be-bad-guys54 F2 · 49%
owasp-web-playground53 F2 · 48%
vulnerable-tornado-app51 F2 · 45%
extremely-vulnerable-flask-app49 F2 · 44%
vulnerable-flask-app46 F2 · 40%
vfapi44 F2 · 41%
damn-vulnerable-flask-application44 F2 · 40%
threatbyte42 F2 · 37%
vulnerable-python-apps40 F2 · 35%
pygoat34 F2 · 33%
vulnerable-api34 F2 · 31%
damn-vulnerable-graphql-application34 F2 · 29%
python-insecure-app34 F2 · 29%
dvpwa33 F2 · 29%
flask-xss31 F2 · 27%
vulpy29 F2 · 25%
djangoat19 F2 · 16%
RepositoryTPFPFNRecall %F2
vulnpy7722198.793.7
intentionally-vulnerable-python-application50271.475.5
dsvpwa2131165.669.1
dsvw1711064.268.7
vampi91657.862.2
dvblab1221056.160.4
insecure-web51455.659.8
pythonssti10150.055.6
python-app1001048.353.7
lets-be-bad-guys1211248.653.5
owasp-web-playground1421448.253.2
vulnerable-tornado-app60845.250.6
extremely-vulnerable-flask-app1421843.848.6
vulnerable-flask-app811240.545.5
vfapi40540.744.1
damn-vulnerable-flask-application61940.043.9
threatbyte1001637.242.5
vulnerable-python-apps811434.839.7
pygoat25145232.934.5
vulnerable-api411030.934.4
damn-vulnerable-graphql-application1002629.233.9
python-insecure-app20629.233.5
dvpwa611628.833.0
flask-xss812226.731.1
vulpy1434325.129.2
djangoat824216.018.6
§

Detection by severity

SeverityTPFPFNRecall %
Critical7301384.9
High141212353.4
Medium110116939.4
Low1505322.1
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
XML External Entities800100.0
Open Redirect600100.0
XPath Injection400100.0
Insecure Deserialization180194.7
Server-Side Request Forgery222291.7
SQL Injection430491.5
Denial of Service170385.0
Command / OS Injection140382.4
Path Traversal210580.8
Code Injection / RFI110378.6
Broken Access Control / IDOR170770.8
HTTP Header Injection10150.0
Cross-Site Scripting3704545.1
Hardcoded Credentials2713444.3
Missing Authentication / Authorization1902840.4
Other61014529.6
Security Misconfiguration802524.2
Sensitive Data Exposure50528.8
§

LLM operational metrics

106,467
Avg input tokens
4,848
Avg output tokens
470,869
Avg total tokens
121s
Avg latency / repo
0.0%
JSON repair rate
74
Total runs
±16.7
F2 run-to-run σ
§

Cost

$28
Total cost
$0.38
Cost / run
$0.140
Cost / 100 LOC
20,062
Python LOC scanned
74
Successful runs

← Back to the leaderboard