realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Minimax M2.7 by MiniMax ↗

General-Purpose LLM · agentic-v1 · scored on 22/26 repositories. Strict scoring (unfinished repos counted as misses).

38.2
F3 (strict)
40.2
F2 (strict)
36.3%
Recall (strict)
71.3%
Precision
22/26
Repos scored
MiniMax-M2.7
Model
$1
Total cost
119s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vulnpy65 F2 · 61%
vampi62 F2 · 60%
intentionally-vulnerable-python-application61 F2 · 57%
vfapi60 F2 · 61%
dsvw59 F2 · 56%
vulnerable-api58 F2 · 57%
dsvpwa56 F2 · 52%
pythonssti56 F2 · 50%
dvblab53 F2 · 52%
python-app48 F2 · 45%
lets-be-bad-guys47 F2 · 44%
vulnerable-tornado-app46 F2 · 43%
damn-vulnerable-flask-application45 F2 · 43%
python-insecure-app42 F2 · 38%
vulnerable-flask-app40 F2 · 38%
pygoat39 F2 · 35%
flask-xss39 F2 · 34%
damn-vulnerable-graphql-application38 F2 · 36%
vulpy38 F2 · 33%
extremely-vulnerable-flask-app36 F2 · 31%
threatbyte35 F2 · 31%
dvpwa26 F2 · 24%
RepositoryTPFPFNRecall %F2
vulnpy4853061.165.0
vampi93660.062.1
intentionally-vulnerable-python-application41357.160.6
vfapi64461.159.9
dsvw1551255.658.6
vulnerable-api85657.157.8
dsvpwa1621651.655.7
pythonssti10150.055.6
dvblab1291052.353.0
python-app951145.047.9
lets-be-bad-guys1041443.847.3
vulnerable-tornado-app63842.946.2
damn-vulnerable-flask-application64843.345.4
python-insecure-app30537.542.3
vulnerable-flask-app881338.139.9
pygoat27135035.138.7
flask-xss1032034.438.6
damn-vulnerable-graphql-application13142336.138.0
vulpy1933833.338.0
extremely-vulnerable-flask-app1012231.236.0
threatbyte831830.834.8
dvpwa571724.226.5
§

Detection by severity

SeverityTPFPFNRecall %
Critical5801579.5
High111010950.5
Medium82015235.0
Low30584.9
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
SQL Injection3600100.0
XML External Entities800100.0
XPath Injection400100.0
Insecure Deserialization140193.3
Code Injection / RFI110191.7
Server-Side Request Forgery200290.9
Command / OS Injection140287.5
Denial of Service170385.0
Open Redirect40180.0
Broken Access Control / IDOR140670.0
Path Traversal160769.6
Hardcoded Credentials2302350.0
HTTP Header Injection10150.0
Cross-Site Scripting2404634.3
Missing Authentication / Authorization902625.7
Other30014716.9
Security Misconfiguration402513.8
Sensitive Data Exposure504310.4
§

LLM operational metrics

30,099
Avg input tokens
5,274
Avg output tokens
168,656
Avg total tokens
119s
Avg latency / repo
5.6%
JSON repair rate
72
Total runs
±10.7
F2 run-to-run σ
§

Cost

$1
Total cost
$0.02
Cost / run
$0.007
Cost / 100 LOC
14,785
Python LOC scanned
50
Successful runs

← Back to the leaderboard