Snyk VulnBench JavaScript 1.0: Can LLMs Find the Same Bugs Twice?
The paper introduces Snyk VulnBench JavaScript 1.0, a benchmark evaluating whether large language models can consistently identify the same software vulnerabilities across repeated attempts. It tests LLMs on JavaScript vulnerability detection, focusing on reproducibility of bug finding.