realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Kimi K2.5 by Moonshot AI ↗

General-Purpose LLM · agentic-v1 · scored on 24/26 repositories. Strict scoring (unfinished repos counted as misses).

46.0
F3 (strict)
47.8
F2 (strict)
44.3%
Recall (strict)
69.3%
Precision
24/26
Repos scored
kimi-k2.5
Model
$2
Total cost
140s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vfapi79 F2 · 96%
intentionally-vulnerable-python-application73 F2 · 71%
vulnpy72 F2 · 68%
vampi67 F2 · 67%
dsvw66 F2 · 62%
insecure-web65 F2 · 67%
python-app61 F2 · 62%
dvblab60 F2 · 58%
vulnerable-flask-app59 F2 · 57%
lets-be-bad-guys59 F2 · 57%
dsvpwa56 F2 · 52%
pythonssti56 F2 · 50%
vulnerable-api55 F2 · 52%
vulnerable-tornado-app55 F2 · 52%
python-insecure-app52 F2 · 50%
dvpwa51 F2 · 48%
damn-vulnerable-flask-application51 F2 · 47%
threatbyte44 F2 · 42%
extremely-vulnerable-flask-app42 F2 · 38%
flask-xss42 F2 · 38%
pygoat39 F2 · 36%
damn-vulnerable-graphql-application39 F2 · 36%
djangoat35 F2 · 32%
vulpy26 F2 · 22%
RepositoryTPFPFNRecall %F2
vfapi910096.379.2
intentionally-vulnerable-python-application51271.473.2
vulnpy5352568.071.6
vampi105566.766.6
dsvw1731061.765.5
insecure-web64366.765.1
python-app129861.761.1
dvblab135957.660.1
vulnerable-flask-app125957.159.4
lets-be-bad-guys1461057.059.3
dsvpwa1741552.156.1
pythonssti10150.055.6
vulnerable-api73752.455.3
vulnerable-tornado-app73752.454.9
python-insecure-app43450.051.8
dvpwa1151148.551.3
damn-vulnerable-flask-application72846.750.7
threatbyte1191542.344.5
extremely-vulnerable-flask-app1232037.541.8
flask-xss1151937.841.6
pygoat28174936.439.4
damn-vulnerable-graphql-application13112336.138.7
djangoat16123432.035.1
vulpy1374422.225.5
§

Detection by severity

SeverityTPFPFNRecall %
Critical6901384.1
High125311851.4
Medium94216636.2
Low1704527.4
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
XPath Injection400100.0
SQL Injection380197.4
Insecure Deserialization150193.8
Command / OS Injection151288.2
XML External Entities70187.5
Code Injection / RFI120285.7
Path Traversal202580.0
Server-Side Request Forgery160672.7
Open Redirect40266.7
Cross-Site Scripting4523457.0
Broken Access Control / IDOR1101150.0
HTTP Header Injection10150.0
Hardcoded Credentials2602749.1
Missing Authentication / Authorization1702639.5
Other53014027.5
Security Misconfiguration802425.0
Sensitive Data Exposure1104121.2
Denial of Service201810.0
§

LLM operational metrics

20,086
Avg input tokens
5,029
Avg output tokens
131,193
Avg total tokens
140s
Avg latency / repo
0.0%
JSON repair rate
72
Total runs
±13.0
F2 run-to-run σ
§

Cost

$2
Total cost
$0.03
Cost / run
$0.011
Cost / 100 LOC
20,062
Python LOC scanned
72
Successful runs

← Back to the leaderboard