Skip to content

Commit 1bc565e

Browse files
feat: add code injection detection to guardrails library (#1091)
* Add initial YARA rule support for injection detection. * Add tests, flows, and baseline YARA rules. * Add docs, add template injection rule * apply pre-commit * chore: add yara-python dependency for jb injection * fix: replace match-case with if-elif for py39 support * make docstrings google-style * handle missing config gracefully * minor changes to path discovery and validation * fix match handling logic * remove unused 'source' parameter * add more tests * even more test * fix failing test * Update tests/test_configs/injection_detection/test.yara Co-authored-by: Pouyan <[email protected]> Signed-off-by: Erick Galinkin <[email protected]> * Split enums out to yara_config.py. Expand documentation. Fix error message formatting. Remove `is_system_action` from `action` decorator. Signed-off-by: Erick Galinkin <[email protected]> * Refactor action to have a single action named `injection detection` that handles the action_option. Fixed tests, docs, and flows to reflect the change. Updated rejection return message to indicate what was blocked to the user. Signed-off-by: Erick Galinkin <[email protected]> * style: apply pre-commits * update poetery lock --------- Signed-off-by: Erick Galinkin <[email protected]> Co-authored-by: Pouyanpi <[email protected]>
1 parent 9994ed0 commit 1bc565e

File tree

17 files changed

+1328
-5
lines changed

17 files changed

+1328
-5
lines changed

docs/user-guides/guardrails-library.md

+54-2
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ NeMo Guardrails comes with a library of built-in guardrails that you can easily
2828
- OpenAI Moderation API - *[COMING SOON]*
2929

3030
4. Other
31-
- [Jailbreak Detection Heuristics](#jailbreak-detection-heuristics)
31+
- [Jailbreak Detection](#jailbreak-detection)
32+
- [Injection Detection](#injection-detection)
3233

3334
## LLM Self-Checking
3435

@@ -848,7 +849,7 @@ For more details, check out the [Prompt Security Integration](./community/prompt
848849

849850
## Other
850851

851-
### Jailbreak Detection Heuristics
852+
### Jailbreak Detection
852853

853854
NeMo Guardrails supports jailbreak detection using a set of heuristics. Currently, two heuristics are supported:
854855

@@ -950,3 +951,54 @@ Times reported below in are **averages** and are reported in milliseconds.
950951
|------------|-------|-----|
951952
| Docker | 2057 | 115 |
952953
| In-Process | 3227 | 157 |
954+
955+
956+
### Injection Detection
957+
NeMo Guardrails offers detection of potential injection attempts (_e.g._ code injection, cross-site scripting, SQL injection, template injection) using [YARA rules](https://yara.readthedocs.io/en/stable/index.html), a technology familiar to many security teams.
958+
NeMo Guardrails ships with some basic rules for the following categories:
959+
* Code injection (Python)
960+
* Cross-site scripting (Markdown and Javascript)
961+
* SQL injection
962+
* Template injection (Jinja)
963+
964+
Additional rules can be added by including them in the `library/injection_detection/yara_rules` folder or specifying a `yara_path` with all the rules.
965+
966+
Injection detection has a number of action options that indicate what to do when potential exploitation is detected.
967+
Two options are currently available: `reject` and `omit`, with `sanitize` planned for a future release.
968+
969+
* `reject` will return a message to the user indicating that their query could not be handled and they should try again.
970+
* `omit` will return the model's output, removing the offending detected content.
971+
* `sanitize` attempts to "de-fang" the malicious content, returning the output in a way that is less likely to result exploitation. This action is generally considered unsuitable for production use.
972+
973+
#### Configuring Injection Detection
974+
To activate injection detection, you must include the `injection detection` output flow.
975+
As an example config:
976+
977+
```colang
978+
rails:
979+
config:
980+
injection_detection:
981+
injections:
982+
- code
983+
- sqli
984+
- template
985+
- xss
986+
action:
987+
reject
988+
989+
output:
990+
flows:
991+
- injection detection
992+
```
993+
994+
**SECURITY WARNING:** It is _strongly_ advised that the `sanitize` action not be used in production systems, as there is no guarantee of its efficacy, and it may lead to adverse security outcomes.
995+
996+
This rail is primarily intended to be used in agentic systems to _enhance_ other security controls as part of a defense in depth strategy.
997+
The provided rules are recommended to be used in the following settings:
998+
* `code`: Recommended if the LLM's output will be used as an argument to downstream functions or passed to a code interpreter.
999+
* `sqli`: Recommended if the LLM's output will be used as part of a SQL query to a database
1000+
* `template`: Recommended for use if LLM output is rendered using templating languages like Jinja. This rule should usually be paired with `code` rules.
1001+
* `xss`: Recommended if LLM output will be rendered directly in HTML or Markdown
1002+
1003+
The included rules are in no way comprehensive.
1004+
They can and should be extended by security teams for use in your application's particular context and paired with additional security controls.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
16+
# SPDX-License-Identifier: Apache-2.0
17+
#
18+
# Licensed under the Apache License, Version 2.0 (the "License");
19+
# you may not use this file except in compliance with the License.
20+
# You may obtain a copy of the License at
21+
#
22+
# http://www.apache.org/licenses/LICENSE-2.0
23+
#
24+
# Unless required by applicable law or agreed to in writing, software
25+
# distributed under the License is distributed on an "AS IS" BASIS,
26+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
27+
# See the License for the specific language governing permissions and
28+
# limitations under the License.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
17+
# SPDX-License-Identifier: Apache-2.0
18+
#
19+
# Licensed under the Apache License, Version 2.0 (the "License");
20+
# you may not use this file except in compliance with the License.
21+
# You may obtain a copy of the License at
22+
#
23+
# http://www.apache.org/licenses/LICENSE-2.0
24+
#
25+
# Unless required by applicable law or agreed to in writing, software
26+
# distributed under the License is distributed on an "AS IS" BASIS,
27+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
28+
# See the License for the specific language governing permissions and
29+
# limitations under the License.
30+
31+
import logging
32+
import re
33+
from functools import lru_cache
34+
from pathlib import Path
35+
from typing import Tuple, Union
36+
37+
import yara
38+
39+
from nemoguardrails import RailsConfig
40+
from nemoguardrails.actions import action
41+
from nemoguardrails.library.injection_detection.yara_config import ActionOptions, Rules
42+
43+
YARA_DIR = Path(__file__).resolve().parent.joinpath("yara_rules")
44+
45+
log = logging.getLogger(__name__)
46+
47+
48+
def _validate_unpack_config(config: RailsConfig) -> Tuple[str, Path, Tuple[str]]:
49+
"""
50+
Validates and unpacks the injection detection configuration.
51+
52+
Args:
53+
config (RailsConfig): The Rails configuration object containing injection detection settings.
54+
55+
Returns:
56+
Tuple[str, Path, Tuple[str]]: A tuple containing the action option, the YARA path,
57+
and the injection rules.
58+
59+
Raises:
60+
FileNotFoundError: If the provided `yara_path` is not a directory.
61+
ValueError: If `yara_path` is not a string, the action option is invalid,
62+
or the injection rules contain invalid elements.
63+
"""
64+
command_injection_config = config.rails.config.injection_detection
65+
66+
if command_injection_config is None:
67+
msg = (
68+
"Injection detection configuration is missing in the provided RailsConfig."
69+
)
70+
log.error(msg)
71+
raise ValueError(msg)
72+
yara_path = command_injection_config.yara_path
73+
if not yara_path:
74+
yara_path = YARA_DIR
75+
elif isinstance(yara_path, str):
76+
yara_path = Path(yara_path)
77+
if not yara_path.exists() or not yara_path.is_dir():
78+
msg = (
79+
"Provided `yara_path` value in injection config %s is not a directory."
80+
% yara_path
81+
)
82+
log.error(msg)
83+
raise FileNotFoundError(msg)
84+
else:
85+
msg = "Expected a string value for `yara_path` but got %r instead." % type(
86+
yara_path
87+
)
88+
89+
log.error(msg)
90+
raise ValueError(msg)
91+
action_option = command_injection_config.action
92+
if action_option not in ActionOptions:
93+
msg = (
94+
"Expected 'reject', 'omit', or 'sanitize' action in injection config but got %s"
95+
% action_option
96+
)
97+
log.error(msg)
98+
raise ValueError(msg)
99+
injection_rules = tuple(command_injection_config.injections)
100+
if not set(injection_rules) <= Rules:
101+
# Do the easy check above first. If they provide a custom dir or a custom rules file, check the filesystem
102+
if not all(
103+
[
104+
yara_path.joinpath(f"{module_name}.yara").is_file()
105+
for module_name in injection_rules
106+
]
107+
):
108+
default_rule_names = ", ".join([member.value for member in Rules])
109+
msg = (
110+
"Provided set of `injections` in injection config %r contains elements not in available rules. "
111+
"Provided rules are in %r."
112+
) % (injection_rules, default_rule_names)
113+
log.error(msg)
114+
raise ValueError(msg)
115+
116+
return action_option, yara_path, injection_rules
117+
118+
119+
@lru_cache()
120+
def load_rules(yara_path: Path, rule_names: Tuple) -> Union[yara.Rules, None]:
121+
"""
122+
Loads and compiles YARA rules from the specified path and rule names.
123+
124+
Args:
125+
yara_path (Path): The path to the directory containing YARA rule files.
126+
rule_names (Tuple): A tuple of YARA rule names to load.
127+
128+
Returns:
129+
Union[yara.Rules, None]: The compiled YARA rules object if successful,
130+
or None if no rule names are provided.
131+
132+
Raises:
133+
yara.SyntaxError: If there is a syntax error in the YARA rules.
134+
"""
135+
if len(rule_names) == 0:
136+
log.warning(
137+
"Injection config was provided but no modules were specified. Returning None."
138+
)
139+
return None
140+
rules_to_load = {
141+
rule_name: str(yara_path.joinpath(f"{rule_name}.yara"))
142+
for rule_name in rule_names
143+
}
144+
try:
145+
rules = yara.compile(filepaths=rules_to_load)
146+
except yara.SyntaxError as e:
147+
msg = f"Encountered SyntaxError: {e}"
148+
log.error(msg)
149+
raise e
150+
return rules
151+
152+
153+
def omit_injection(text: str, matches: list[yara.Match]) -> str:
154+
"""
155+
Attempts to strip the offending injection attempts from the provided text.
156+
157+
Note:
158+
This method may not be completely effective and could still result in
159+
malicious activity.
160+
161+
Args:
162+
text (str): The text to check for command injection.
163+
matches (list[yara.Match]): A list of YARA rule matches.
164+
165+
Returns:
166+
str: The text with the detected injections stripped out.
167+
"""
168+
# Copy the text to a placeholder variable
169+
modified_text = text
170+
for match in matches:
171+
if match.strings:
172+
for match_string in match.strings:
173+
for instance in match_string.instances:
174+
try:
175+
plaintext = instance.plaintext().decode("utf-8")
176+
if plaintext in modified_text:
177+
modified_text = modified_text.replace(plaintext, "")
178+
except (AttributeError, UnicodeDecodeError) as e:
179+
log.warning(f"Error processing match: {e}")
180+
return modified_text
181+
182+
183+
def sanitize_injection(text: str, matches: list[yara.Match]) -> str:
184+
"""
185+
Attempts to sanitize the offending injection attempts in the provided text.
186+
This is done by 'de-fanging' the offending content, transforming it into a state that will not execute
187+
downstream commands.
188+
189+
Note:
190+
This method may not be completely effective and could still result in
191+
malicious activity. Sanitizing malicious input instead of rejecting or
192+
omitting it is inherently risky and generally not recommended.
193+
194+
Args:
195+
text (str): The text to check for command injection.
196+
matches (list[yara.Match]): A list of YARA rule matches.
197+
198+
Returns:
199+
str: The text with the detected injections sanitized.
200+
201+
Raises:
202+
NotImplementedError: If the sanitization logic is not implemented.
203+
"""
204+
raise NotImplementedError(
205+
"Injection sanitization is not yet implemented. Please use 'reject' or 'omit'"
206+
)
207+
208+
209+
def reject_injection(text: str, rules: yara.Rules) -> Tuple[bool, str]:
210+
"""
211+
Detects whether the provided text contains potential injection attempts.
212+
213+
This function is recommended as an output or execution guardrail. It loads
214+
all relevant YARA rules and compiles them according to the provided configuration.
215+
216+
Args:
217+
text (str): The text to check for command injection.
218+
rules (yara.Rules): The loaded YARA rules.
219+
220+
Returns:
221+
bool: True if attempted exploitation is detected, False otherwise.
222+
str: list of matches as a string
223+
224+
Raises:
225+
ValueError: If the `action` parameter in the configuration is invalid.
226+
"""
227+
if rules is None:
228+
log.warning(
229+
"reject_injection guardrail was invoked but no rules were specified in the InjectionDetection config."
230+
)
231+
return False, ""
232+
matches = rules.match(data=text)
233+
if matches:
234+
matches_string = ", ".join([match_name.rule for match_name in matches])
235+
log.info(f"Input matched on rule {matches_string}.")
236+
return True, matches_string
237+
else:
238+
return False, ""
239+
240+
241+
@action()
242+
async def injection_detection(text: str, config: RailsConfig) -> str:
243+
"""
244+
Detects and mitigates potential injection attempts in the provided text.
245+
246+
Depending on the configuration, this function can omit or sanitize the detected
247+
injection attempts. If the action is set to "reject", it delegates to the
248+
`reject_injection` function.
249+
250+
Args:
251+
text (str): The text to check for command injection.
252+
config (RailsConfig): The Rails configuration object containing injection detection settings.
253+
254+
Returns:
255+
str: The sanitized or original text, depending on the action specified in the configuration.
256+
257+
Raises:
258+
ValueError: If the `action` parameter in the configuration is invalid.
259+
NotImplementedError: If an unsupported action is encountered.
260+
"""
261+
action_option, yara_path, rule_names = _validate_unpack_config(config)
262+
rules = load_rules(yara_path, rule_names)
263+
if action_option == "reject":
264+
verdict, detections = reject_injection(text, rules)
265+
if verdict:
266+
return f"I'm sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}."
267+
else:
268+
return text
269+
if rules is None:
270+
log.warning(
271+
"injection detection guardrail was invoked but no rules were specified in the InjectionDetection config."
272+
)
273+
return text
274+
matches = rules.match(data=text)
275+
if matches:
276+
matches_string = ", ".join([match_name.rule for match_name in matches])
277+
log.info(f"Input matched on rule {matches_string}.")
278+
if action_option == "omit":
279+
return omit_injection(text, matches)
280+
elif action_option == "sanitize":
281+
return sanitize_injection(text, matches)
282+
else:
283+
# We should never ever hit this since we inspect the action option above, but putting an error here anyway.
284+
raise NotImplementedError(
285+
f"Expected `action` parameter to be 'omit' or 'sanitize' but got {action_option} instead."
286+
)
287+
else:
288+
return text
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# OUTPUT RAILS
2+
3+
flow injection detection
4+
"""
5+
Reject, omit, or sanitize injection attempts from the bot.
6+
"""
7+
$bot_message = await InjectionDetectionAction(text=$bot_message)

0 commit comments

Comments
 (0)