realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Kimi K2.6 by Moonshot AI ↗

General-Purpose LLM · agentic-v1 · scored on 25/26 repositories. Strict scoring (unfinished repos counted as misses).

53.9
F3 (strict)
55.8
F2 (strict)
52.1%
Recall (strict)
78.6%
Precision
25/26
Repos scored
kimi-k2.6
Model
$6
Total cost
603s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vulnpy87 F2 · 88%
pythonssti85 F2 · 83%
damn-vulnerable-flask-application81 F2 · 80%
vfapi75 F2 · 72%
dsvw74 F2 · 70%
intentionally-vulnerable-python-application72 F2 · 71%
insecure-web72 F2 · 70%
lets-be-bad-guys67 F2 · 62%
owasp-web-playground67 F2 · 64%
vampi66 F2 · 63%
threatbyte66 F2 · 62%
vulnerable-api65 F2 · 61%
vulnerable-flask-app58 F2 · 59%
dvpwa58 F2 · 57%
python-insecure-app58 F2 · 54%
extremely-vulnerable-flask-app57 F2 · 52%
vulnerable-tornado-app56 F2 · 52%
dvblab50 F2 · 45%
vulpy49 F2 · 44%
damn-vulnerable-graphql-application45 F2 · 42%
pygoat44 F2 · 41%
djangoat44 F2 · 41%
vulnerable-python-apps42 F2 · 38%
flask-xss38 F2 · 33%
python-app34 F2 · 32%
RepositoryTPFPFNRecall %F2
vulnpy6914988.587.3
pythonssti20083.385.2
damn-vulnerable-flask-application122380.081.0
vfapi60272.275.0
dsvw191870.474.2
intentionally-vulnerable-python-application51271.472.5
insecure-web62370.471.5
lets-be-bad-guys151962.567.1
owasp-web-playground1841064.367.0
vampi102663.366.0
threatbyte1621061.565.6
vulnerable-api81660.764.8
vulnerable-flask-app127958.758.4
dvpwa1271056.858.2
python-insecure-app41454.257.5
extremely-vulnerable-flask-app1721552.156.6
vulnerable-tornado-app72752.456.1
dvblab1021245.549.5
vulpy2523243.948.9
damn-vulnerable-graphql-application1562141.745.2
pygoat31134640.744.3
djangoat20123041.044.0
vulnerable-python-apps831437.941.5
flask-xss1022033.337.9
python-app6101432.533.9
§

Detection by severity

SeverityTPFPFNRecall %
Critical7001285.4
High139111654.5
Medium135113050.9
Low1305020.6
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
Open Redirect400100.0
HTTP Header Injection200100.0
XPath Injection400100.0
Command / OS Injection140193.3
Server-Side Request Forgery210291.3
Insecure Deserialization160288.9
XML External Entities70187.5
Path Traversal201483.3
SQL Injection360980.0
Code Injection / RFI110378.6
Hardcoded Credentials4301672.9
Cross-Site Scripting5012963.3
Denial of Service120860.0
Broken Access Control / IDOR1401058.3
Missing Authentication / Authorization2202447.8
Sensitive Data Exposure1903535.2
Security Misconfiguration902130.0
Other53014327.0
§

LLM operational metrics

26,815
Avg input tokens
17,762
Avg output tokens
291,901
Avg total tokens
603s
Avg latency / repo
6.4%
JSON repair rate
78
Total runs
±14.8
F2 run-to-run σ
§

Cost

$6
Total cost
$0.10
Cost / run
$0.032
Cost / 100 LOC
19,454
Python LOC scanned
60
Successful runs

← Back to the leaderboard