Dashboard — RealVuln

Scanners

3 categories

Repositories

All apps

86.5

Best F3 (strict)

Kolega Enterprise

0.89

Highest recall %

Kolega Enterprise

133,782

Total LOC

across all repos

Leaderboard

ranked by active metric

#	Scanner ▼	F3 ▼	Recall % ▼	Found ▼	FPs ▼	FP/TP ▼	Prec % ▼	Noise % ▼	Repos ▼	Cost/100k ▼	$/100V ▼

Precision vs. recall

hover a point

Performance vs. cost

F3 vs cost

Recall ranking

fraction of vulnerabilities found

Precision ranking

fraction of flags that were real

By category

three-tier summary

Detection by vulnerability class

recall %, best by approach

▸ LLM-based scanners dominate classes that need semantic data-flow understanding — SQL injection, command injection, insecure deserialization. ▸ Rule-based tools stay competitive only on syntactic patterns, and even there overall recall remains low.

Dataset composition

1,903 vulnerabilities · 279 FP traps · 66 repositories

Findings

1,903 vulnerabilities

279

Real vulnerabilities FP traps (12.8%)

CWE families

133,782

Python LOC

Frameworks (66 repos)

Django23

FastAPI23

Flask15

custom3

aiohttp1

Tornado1

Scanner categories

GP-LLM14

Rule SAST2

Sec.-spec.6

Frameworks

Scanners tested

All figures are live RealVuln results across 22 scanners and 66 repositories. F3 weights recall nine times over precision; strict mode counts unfinished repositories as misses. Cost is API spend normalized per 100,000 lines of code scanned (rule-based tools are free or variably priced). Metric definitions →