Scanner deep-dive
Kimi K2.6 by Moonshot AI ↗
General-Purpose LLM · agentic-v1 · scored on 25/26 repositories. Strict scoring (unfinished repos counted as misses).
53.9
F3 (strict)
55.8
F2 (strict)
52.1%
Recall (strict)
78.6%
Precision
25/26
Repos scored
kimi-k2.6
Model
$6
Total cost
603s
Avg latency
§
Per-repository breakdown
Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.
| Repository | TP | FP | FN | Recall % | F2 |
|---|---|---|---|---|---|
| vulnpy | 69 | 14 | 9 | 88.5 | 87.3 |
| pythonssti | 2 | 0 | 0 | 83.3 | 85.2 |
| damn-vulnerable-flask-application | 12 | 2 | 3 | 80.0 | 81.0 |
| vfapi | 6 | 0 | 2 | 72.2 | 75.0 |
| dsvw | 19 | 1 | 8 | 70.4 | 74.2 |
| intentionally-vulnerable-python-application | 5 | 1 | 2 | 71.4 | 72.5 |
| insecure-web | 6 | 2 | 3 | 70.4 | 71.5 |
| lets-be-bad-guys | 15 | 1 | 9 | 62.5 | 67.1 |
| owasp-web-playground | 18 | 4 | 10 | 64.3 | 67.0 |
| vampi | 10 | 2 | 6 | 63.3 | 66.0 |
| threatbyte | 16 | 2 | 10 | 61.5 | 65.6 |
| vulnerable-api | 8 | 1 | 6 | 60.7 | 64.8 |
| vulnerable-flask-app | 12 | 7 | 9 | 58.7 | 58.4 |
| dvpwa | 12 | 7 | 10 | 56.8 | 58.2 |
| python-insecure-app | 4 | 1 | 4 | 54.2 | 57.5 |
| extremely-vulnerable-flask-app | 17 | 2 | 15 | 52.1 | 56.6 |
| vulnerable-tornado-app | 7 | 2 | 7 | 52.4 | 56.1 |
| dvblab | 10 | 2 | 12 | 45.5 | 49.5 |
| vulpy | 25 | 2 | 32 | 43.9 | 48.9 |
| damn-vulnerable-graphql-application | 15 | 6 | 21 | 41.7 | 45.2 |
| pygoat | 31 | 13 | 46 | 40.7 | 44.3 |
| djangoat | 20 | 12 | 30 | 41.0 | 44.0 |
| vulnerable-python-apps | 8 | 3 | 14 | 37.9 | 41.5 |
| flask-xss | 10 | 2 | 20 | 33.3 | 37.9 |
| python-app | 6 | 10 | 14 | 32.5 | 33.9 |
§
Detection by severity
| Severity | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Critical | 70 | 0 | 12 | 85.4 |
| High | 139 | 1 | 116 | 54.5 |
| Medium | 135 | 1 | 130 | 50.9 |
| Low | 13 | 0 | 50 | 20.6 |
§
Detection by vulnerability class
| CWE family | TP | FP | FN | Recall % |
|---|---|---|---|---|
| Open Redirect | 4 | 0 | 0 | 100.0 |
| HTTP Header Injection | 2 | 0 | 0 | 100.0 |
| XPath Injection | 4 | 0 | 0 | 100.0 |
| Command / OS Injection | 14 | 0 | 1 | 93.3 |
| Server-Side Request Forgery | 21 | 0 | 2 | 91.3 |
| Insecure Deserialization | 16 | 0 | 2 | 88.9 |
| XML External Entities | 7 | 0 | 1 | 87.5 |
| Path Traversal | 20 | 1 | 4 | 83.3 |
| SQL Injection | 36 | 0 | 9 | 80.0 |
| Code Injection / RFI | 11 | 0 | 3 | 78.6 |
| Hardcoded Credentials | 43 | 0 | 16 | 72.9 |
| Cross-Site Scripting | 50 | 1 | 29 | 63.3 |
| Denial of Service | 12 | 0 | 8 | 60.0 |
| Broken Access Control / IDOR | 14 | 0 | 10 | 58.3 |
| Missing Authentication / Authorization | 22 | 0 | 24 | 47.8 |
| Sensitive Data Exposure | 19 | 0 | 35 | 35.2 |
| Security Misconfiguration | 9 | 0 | 21 | 30.0 |
| Other | 53 | 0 | 143 | 27.0 |
§
LLM operational metrics
26,815
Avg input tokens
17,762
Avg output tokens
291,901
Avg total tokens
603s
Avg latency / repo
6.4%
JSON repair rate
78
Total runs
±14.8
F2 run-to-run σ
§
Cost
$6
Total cost
$0.10
Cost / run
$0.032
Cost / 100 LOC
19,454
Python LOC scanned
60
Successful runs