Skip to content

docs: Possible update to injection detection #1144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 91 additions & 24 deletions docs/user-guides/guardrails-library.md
Original file line number Diff line number Diff line change
Expand Up @@ -952,29 +952,37 @@ Times reported below in are **averages** and are reported in milliseconds.
| Docker | 2057 | 115 |
| In-Process | 3227 | 157 |


### Injection Detection
NeMo Guardrails offers detection of potential injection attempts (_e.g._ code injection, cross-site scripting, SQL injection, template injection) using [YARA rules](https://yara.readthedocs.io/en/stable/index.html), a technology familiar to many security teams.
NeMo Guardrails ships with some basic rules for the following categories:
* Code injection (Python)
* Cross-site scripting (Markdown and Javascript)
* SQL injection
* Template injection (Jinja)

Additional rules can be added by including them in the `library/injection_detection/yara_rules` folder or specifying a `yara_path` with all the rules.
NeMo Guardrails offers detection of potential injection attempts such as code injection, cross-site scripting, SQL injection, and template injection.
Injection detection is primarily intended to be used in agentic systems to enhance other security controls as part of a defense-in-depth strategy.

The first part of injection detection is [YARA rules](https://yara.readthedocs.io/en/stable/index.html).
A YARA rule specifies a set of strings--text or binary patterns--to match and a Boolean expression that specifies the logic of the rule.
YARA rules is a technology that is familiar to many security teams.

The second part of injection detection is specifying the action to take when a rule is triggered.
You can specify to *reject* the text and return "I'm sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}."
Or, you can specify to *omit* the triggering text from the response.

#### About the Default Rules

Injection detection has a number of action options that indicate what to do when potential exploitation is detected.
Two options are currently available: `reject` and `omit`, with `sanitize` planned for a future release.
By default, NeMo Guardrails provides the following rules:

* `reject` will return a message to the user indicating that their query could not be handled and they should try again.
* `omit` will return the model's output, removing the offending detected content.
* `sanitize` attempts to "de-fang" the malicious content, returning the output in a way that is less likely to result exploitation. This action is generally considered unsuitable for production use.
- Code injection (Python): Recommended if the LLM output is used as an argument to downstream functions or passed to a code interpreter.
- SQL injection: Recommended if the LLM output is used as part of a SQL query to a database.
- Template injection (Jinja): Recommended for use if LLM output is rendered using templating languages like Jinja.
This rule is usually paired with code injection rules.
- Cross-site scripting (Markdown and Javascript): Recommended if the LLM output is rendered directly in HTML or Markdown.

You can view the default rules in the [yara_rules directory](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/library/injection_detection/yara_rules) of the GitHub repository.

#### Configuring Injection Detection
To activate injection detection, you must include the `injection detection` output flow.

To activate injection detection, you must specify the rules to apply and the action to take as well as include the `injection detection` output flow.
As an example config:

```colang
```yaml
rails:
config:
injection_detection:
Expand All @@ -991,14 +999,73 @@ rails:
- injection detection
```

**SECURITY WARNING:** It is _strongly_ advised that the `sanitize` action not be used in production systems, as there is no guarantee of its efficacy, and it may lead to adverse security outcomes.
Refer to the following table for the `rails.config.injection_detection` field syntax reference:

```{list-table}
:header-rows: 1

* - Field
- Description
- Default Value

* - `injections`
- Specifies the injection detection rules to use.
The following injections are part of the library:

- `code` for Python code injection
- `sqli` for SQL injection
- `template` for Jinja template injection
- `xss` for cross-site scripting
- None (required)

* - `action`
- Specifies the action to take when injection is detected.
Refer to the following actions:

- `reject` returns a message to the user indicating that the query could not be handled and they should try again.
- `omit` returns the model response, removing the offending detected content.
- None (required)

* - `yara_path`
- Specifies the path to a directory that contains custom YARA rules.
- `library/injection_detection/yara_rules` in the NeMo Guardrails package.
```

#### Example

Before you begin, install the `yara-python` package or you can install the NeMo Guardrails package with `pip install nemoguardrails[jailbreak]`.

1. Set your NVIDIA API key as an environment variable:

```console
$ export NVIDIA_API_KEY=<nvapi-...>
```

1. Create a configuration directory, such as `config`, and add a `config.yml` file with contents like the following:

```{literalinclude} ../../examples/configs/injection_detection/config/config.yml
:language: yaml
```

1. Load the guardrails configuration:

```{literalinclude} ../../examples/configs/injection_detection/demo.py
:language: python
:start-after: "# start-load-config"
:end-before: "# end-load-config"
```

1. Send a possibly unsafe request:

```{literalinclude} ../../examples/configs/injection_detection/demo.py
:language: python
:start-after: "# start-unsafe-response"
:end-before: "# end-unsafe-response"
```

This rail is primarily intended to be used in agentic systems to _enhance_ other security controls as part of a defense in depth strategy.
The provided rules are recommended to be used in the following settings:
* `code`: Recommended if the LLM's output will be used as an argument to downstream functions or passed to a code interpreter.
* `sqli`: Recommended if the LLM's output will be used as part of a SQL query to a database
* `template`: Recommended for use if LLM output is rendered using templating languages like Jinja. This rule should usually be paired with `code` rules.
* `xss`: Recommended if LLM output will be rendered directly in HTML or Markdown
*Example Output*

The included rules are in no way comprehensive.
They can and should be extended by security teams for use in your application's particular context and paired with additional security controls.
```{literalinclude} ../../examples/configs/injection_detection/demo-out.txt
:start-after: "# start-unsafe-response"
:end-before: "# end-unsafe-response"
```
22 changes: 22 additions & 0 deletions examples/configs/injection_detection/config/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
models:
- type: main
engine: nvidia_ai_endpoints
model: meta/llama-3.3-70b-instruct

rails:
config:
injection_detection:
injections:
- code
- sqli
- template
- xss
action: reject

output:
streaming:
enabled: True
chunk_size: 200
context_size: 50

streaming: True
3 changes: 3 additions & 0 deletions examples/configs/injection_detection/demo-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# start-unsafe-response
{'role': 'assistant', 'content': '**Getting the Weather in Santa Clara using Python**\n=====================================================\n\nTo get the weather in Santa Clara, we can use the OpenWeatherMap API, which provides current and forecasted weather conditions. We will use the `requests` library to make an HTTP request to the API and the `json` library to parse the response.\n\n**Prerequisites**\n---------------\n\n* Python 3.x\n* `requests` library (`pip install requests`)\n* OpenWeatherMap API key (sign up for free at [OpenWeatherMap](https://home.openweathermap.org/users/sign_up))\n\n**Code**\n-----\n\n```python\nimport requests\nimport json\n\ndef get_weather(api_key, city, units=\'metric\'):\n """\n Get the current weather in a city.\n\n Args:\n api_key (str): OpenWeatherMap API key\n city (str): City name\n units (str, optional): Units of measurement (default: \'metric\')\n\n Returns:\n dict: Weather data\n """\n base_url = \'http://api.openweathermap.org/data/2.5/weather\'\n params = {\n \'q\': city,\n \'units\': units,\n \'appid\': api_key\n }\n response = requests.get(base_url, params=params)\n response.raise_for_status()\n return response.json()\n\ndef main():\n api_key = \'YOUR_API_KEY\' # replace with your OpenWeatherMap API key\n city = \'Santa Clara\'\n weather_data = get_weather(api_key, city)\n print(\'Weather in {}:\'.format(city))\n print(\'Temperature: {}°C\'.format(weather_data[\'main\'][\'temp\']))\n print(\'Humidity: {}%\'.format(weather_data[\'main\'][\'humidity\']))\n print(\'Conditions: {}\'.format(weather_data[\'weather\'][0][\'description\']))\n\nif __name__ == \'__main__\':\n main()\n```\n\n**Explanation**\n--------------\n\n1. We import the required libraries: `requests` for making HTTP requests and `json` for parsing the response.\n2. We define a function `get_weather` that takes the API key, city name, and units of measurement as arguments.\n3. We construct the API URL and parameters using the `base_url` and `params` dictionary.\n4. We make a GET request to the API using `requests.get` and raise an exception if the response status code is not 200 (OK).\n5. We parse the response data using `response.json()` and return the result as a dictionary.\n6. In the `main` function, we set the API key and city name, call `get_weather`, and print the weather data.\n\n**Notes**\n-------\n\n* Replace `YOUR_API_KEY` with your actual OpenWeatherMap API key.\n* You can adjust the units of measurement by passing a different value for the `units` parameter (e.g., \'imperial\' for Fahrenheit).\n* This script only retrieves the current weather conditions. For forecasted weather, you can use the [OpenWeatherMap forecast API](https://openweathermap.org/forecast5).\n\nI hope this helps! Let me know if you have any questions or need further assistance.'}
# end-unsafe-response
57 changes: 57 additions & 0 deletions examples/configs/injection_detection/demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# fmt: off

import atexit
import os
import sys
from pathlib import Path

curdir = os.getcwd()

@atexit.register
def cleanup():
os.chdir(curdir)

os.chdir(Path(__file__).parent)

# start-load-config
import os

from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("./config")
rails = LLMRails(config)
# end-load-config

# start-unsafe-response
messages = [
{
"role": "user",
"content": "Write a Python script that uses the requests library, and maybe urllib, to get the weather in Santa Clara.",
},
]

response = rails.generate(messages=messages)
print(response)
# end-unsafe-response

stdout = sys.stdout
with open("demo-out.txt", "w") as sys.stdout:
print("# start-unsafe-response")
print(response)
print("# end-unsafe-response\n")
sys.stdout = stdout
Loading