realvuln v1.0
Dashboard Methodology Dataset Findings Roadmap GitHub ↗
Scanner deep-dive

Qwen 3.5 397B by Alibaba Qwen ↗

General-Purpose LLM · agentic-v1 · scored on 24/26 repositories. Strict scoring (unfinished repos counted as misses).

38.2
F3 (strict)
39.9
F2 (strict)
36.5%
Recall (strict)
63.6%
Precision
24/26
Repos scored
together_ai/Qwen/Qwen3.5-397B-A17B
Model
$3
Total cost
77s
Avg latency
§

Per-repository breakdown

Each bar shows true positives, false positives, and misses on one repository; bar length is proportional to that repo's labeled vulnerabilities. Ranked by F2.

True positiveFalse positiveMissed (FN)
vfapi65 F2 · 74%
dvblab64 F2 · 59%
insecure-web61 F2 · 59%
dsvw60 F2 · 58%
intentionally-vulnerable-python-application60 F2 · 57%
vulnpy59 F2 · 54%
pythonssti56 F2 · 50%
vulnerable-api54 F2 · 50%
vampi52 F2 · 53%
python-insecure-app50 F2 · 46%
vulnerable-tornado-app48 F2 · 45%
damn-vulnerable-flask-application47 F2 · 44%
lets-be-bad-guys46 F2 · 44%
pygoat45 F2 · 44%
vulnerable-flask-app42 F2 · 40%
dsvpwa40 F2 · 35%
python-app39 F2 · 37%
damn-vulnerable-graphql-application37 F2 · 37%
threatbyte35 F2 · 32%
dvpwa32 F2 · 29%
flask-xss31 F2 · 27%
djangoat28 F2 · 25%
extremely-vulnerable-flask-app27 F2 · 23%
vulpy20 F2 · 17%
RepositoryTPFPFNRecall %F2
vfapi79274.164.6
dvblab131959.163.7
insecure-web52459.361.0
dsvw1661158.060.1
intentionally-vulnerable-python-application41357.160.0
vulnpy4243654.559.2
pythonssti10150.055.6
vulnerable-api71750.054.4
vampi810753.351.6
python-insecure-app41445.849.6
vulnerable-tornado-app63845.248.4
damn-vulnerable-flask-application73844.447.4
lets-be-bad-guys1181344.446.5
pygoat34274443.545.4
vulnerable-flask-app861339.742.2
dsvpwa1142135.439.6
python-app771336.738.8
damn-vulnerable-graphql-application13222337.036.6
threatbyte851832.035.3
dvpwa641628.832.2
flask-xss822226.730.8
djangoat1283824.728.0
extremely-vulnerable-flask-app722522.926.6
vulpy10104717.019.5
§

Detection by severity

SeverityTPFPFNRecall %
Critical6311976.8
High117612648.1
Medium78018230.0
Low50578.1
§

Detection by vulnerability class

CWE familyTPFPFNRecall %
XML External Entities800100.0
XPath Injection400100.0
SQL Injection383197.4
Insecure Deserialization151193.8
Command / OS Injection151288.2
Code Injection / RFI120285.7
Path Traversal182772.0
Open Redirect40266.7
Broken Access Control / IDOR130959.1
Hardcoded Credentials2902454.7
Server-Side Request Forgery1201054.5
HTTP Header Injection10150.0
Cross-Site Scripting2405530.4
Missing Authentication / Authorization1103225.6
Security Misconfiguration702521.9
Other40015320.7
Sensitive Data Exposure904317.3
Denial of Service301715.0
§

LLM operational metrics

43,965
Avg input tokens
4,943
Avg output tokens
121,929
Avg total tokens
77s
Avg latency / repo
16.7%
JSON repair rate
72
Total runs
±12.9
F2 run-to-run σ
§

Cost

$3
Total cost
$0.05
Cost / run
$0.016
Cost / 100 LOC
20,062
Python LOC scanned
69
Successful runs

← Back to the leaderboard