Prompt Injection on LlamaGuard is not considered a vulnerability #77

Eric-Hacker · 2025-02-21T16:28:41Z

I reported a vulnerability in LlamaGuard to a prompt injection that allowed one to bypass detection through the official Meta process. This report was closed as "Reports describing methods to generate content that bypass the model's safeguards are not eligible for a reward under our program". I was not looking for a bounty. I was simply trying to communicate about the issue through non-public channels as appropriate for security issues. I fully agree that payouts for prompt injections on Meta's LLMs would drain the entire Meta security program budget probably bankrupt the whole company.

And that's my point. Meta is promoting that it can provide moderation for LLMs with LLMs. While Meta admits in the LlamaGuard documentation that such protection may be bypassed with prompt injection, the reality of the bug bounty program suggests that Meta knows this is a bottomless pit.

I have multiple prompt injections that evade detection in LlamaGuard 3B over 90% of the time when running a collection of over 1800 Garak attempts that were previously deemed unsafe. I have a process that will let me discover even more.

I had to sign up for a Facebook account last week in order to report this vulnerability. Despite having to provide a photo ID just to get my account, this morning my Facebook account was suspended. I suspect that having a surname the same as the street Meta's headquarters is on, and not having any friends on Facebook is the reason. No matter what the cause, if you want more details we'll need to communicate here or you can find me on LinkedIn, which probably doesn't block people named Maude from registering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Injection on LlamaGuard is not considered a vulnerability #77

Prompt Injection on LlamaGuard is not considered a vulnerability #77

Eric-Hacker commented Feb 21, 2025

Prompt Injection on LlamaGuard is not considered a vulnerability #77

Prompt Injection on LlamaGuard is not considered a vulnerability #77

Comments

Eric-Hacker commented Feb 21, 2025