Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt Injection on LlamaGuard is not considered a vulnerability #77

Open
Eric-Hacker opened this issue Feb 21, 2025 · 0 comments
Open

Comments

@Eric-Hacker
Copy link

I reported a vulnerability in LlamaGuard to a prompt injection that allowed one to bypass detection through the official Meta process. This report was closed as "Reports describing methods to generate content that bypass the model's safeguards are not eligible for a reward under our program". I was not looking for a bounty. I was simply trying to communicate about the issue through non-public channels as appropriate for security issues. I fully agree that payouts for prompt injections on Meta's LLMs would drain the entire Meta security program budget probably bankrupt the whole company.

And that's my point. Meta is promoting that it can provide moderation for LLMs with LLMs. While Meta admits in the LlamaGuard documentation that such protection may be bypassed with prompt injection, the reality of the bug bounty program suggests that Meta knows this is a bottomless pit.

I have multiple prompt injections that evade detection in LlamaGuard 3B over 90% of the time when running a collection of over 1800 Garak attempts that were previously deemed unsafe. I have a process that will let me discover even more.

I had to sign up for a Facebook account last week in order to report this vulnerability. Despite having to provide a photo ID just to get my account, this morning my Facebook account was suspended. I suspect that having a surname the same as the street Meta's headquarters is on, and not having any friends on Facebook is the reason. No matter what the cause, if you want more details we'll need to communicate here or you can find me on LinkedIn, which probably doesn't block people named Maude from registering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant