diff --git a/docs/user-guides/guardrails-library.md b/docs/user-guides/guardrails-library.md index eeed3e18d..363a8ae93 100644 --- a/docs/user-guides/guardrails-library.md +++ b/docs/user-guides/guardrails-library.md @@ -952,29 +952,37 @@ Times reported below in are **averages** and are reported in milliseconds. | Docker | 2057 | 115 | | In-Process | 3227 | 157 | - ### Injection Detection -NeMo Guardrails offers detection of potential injection attempts (_e.g._ code injection, cross-site scripting, SQL injection, template injection) using [YARA rules](https://yara.readthedocs.io/en/stable/index.html), a technology familiar to many security teams. -NeMo Guardrails ships with some basic rules for the following categories: -* Code injection (Python) -* Cross-site scripting (Markdown and Javascript) -* SQL injection -* Template injection (Jinja) -Additional rules can be added by including them in the `library/injection_detection/yara_rules` folder or specifying a `yara_path` with all the rules. +NeMo Guardrails offers detection of potential injection attempts such as code injection, cross-site scripting, SQL injection, and template injection. +Injection detection is primarily intended to be used in agentic systems to enhance other security controls as part of a defense-in-depth strategy. + +The first part of injection detection is [YARA rules](https://yara.readthedocs.io/en/stable/index.html). +A YARA rule specifies a set of strings--text or binary patterns--to match and a Boolean expression that specifies the logic of the rule. +YARA rules is a technology that is familiar to many security teams. + +The second part of injection detection is specifying the action to take when a rule is triggered. +You can specify to *reject* the text and return "I'm sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}." +Or, you can specify to *omit* the triggering text from the response. + +#### About the Default Rules -Injection detection has a number of action options that indicate what to do when potential exploitation is detected. -Two options are currently available: `reject` and `omit`, with `sanitize` planned for a future release. +By default, NeMo Guardrails provides the following rules: -* `reject` will return a message to the user indicating that their query could not be handled and they should try again. -* `omit` will return the model's output, removing the offending detected content. -* `sanitize` attempts to "de-fang" the malicious content, returning the output in a way that is less likely to result exploitation. This action is generally considered unsuitable for production use. +- Code injection (Python): Recommended if the LLM output is used as an argument to downstream functions or passed to a code interpreter. +- SQL injection: Recommended if the LLM output is used as part of a SQL query to a database. +- Template injection (Jinja): Recommended for use if LLM output is rendered using templating languages like Jinja. + This rule is usually paired with code injection rules. +- Cross-site scripting (Markdown and Javascript): Recommended if the LLM output is rendered directly in HTML or Markdown. + +You can view the default rules in the [yara_rules directory](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/nemoguardrails/library/injection_detection/yara_rules) of the GitHub repository. #### Configuring Injection Detection -To activate injection detection, you must include the `injection detection` output flow. + +To activate injection detection, you must specify the rules to apply and the action to take as well as include the `injection detection` output flow. As an example config: -```colang +```yaml rails: config: injection_detection: @@ -991,14 +999,73 @@ rails: - injection detection ``` -**SECURITY WARNING:** It is _strongly_ advised that the `sanitize` action not be used in production systems, as there is no guarantee of its efficacy, and it may lead to adverse security outcomes. +Refer to the following table for the `rails.config.injection_detection` field syntax reference: + +```{list-table} +:header-rows: 1 + +* - Field + - Description + - Default Value + +* - `injections` + - Specifies the injection detection rules to use. + The following injections are part of the library: + + - `code` for Python code injection + - `sqli` for SQL injection + - `template` for Jinja template injection + - `xss` for cross-site scripting + - None (required) + +* - `action` + - Specifies the action to take when injection is detected. + Refer to the following actions: + + - `reject` returns a message to the user indicating that the query could not be handled and they should try again. + - `omit` returns the model response, removing the offending detected content. + - None (required) + +* - `yara_path` + - Specifies the path to a directory that contains custom YARA rules. + - `library/injection_detection/yara_rules` in the NeMo Guardrails package. +``` + +#### Example + +Before you begin, install the `yara-python` package or you can install the NeMo Guardrails package with `pip install nemoguardrails[jailbreak]`. + +1. Set your NVIDIA API key as an environment variable: + + ```console + $ export NVIDIA_API_KEY= + ``` + +1. Create a configuration directory, such as `config`, and add a `config.yml` file with contents like the following: + + ```{literalinclude} ../../examples/configs/injection_detection/config/config.yml + :language: yaml + ``` + +1. Load the guardrails configuration: + + ```{literalinclude} ../../examples/configs/injection_detection/demo.py + :language: python + :start-after: "# start-load-config" + :end-before: "# end-load-config" + ``` + +1. Send a possibly unsafe request: + + ```{literalinclude} ../../examples/configs/injection_detection/demo.py + :language: python + :start-after: "# start-unsafe-response" + :end-before: "# end-unsafe-response" + ``` -This rail is primarily intended to be used in agentic systems to _enhance_ other security controls as part of a defense in depth strategy. -The provided rules are recommended to be used in the following settings: -* `code`: Recommended if the LLM's output will be used as an argument to downstream functions or passed to a code interpreter. -* `sqli`: Recommended if the LLM's output will be used as part of a SQL query to a database -* `template`: Recommended for use if LLM output is rendered using templating languages like Jinja. This rule should usually be paired with `code` rules. -* `xss`: Recommended if LLM output will be rendered directly in HTML or Markdown + *Example Output* -The included rules are in no way comprehensive. -They can and should be extended by security teams for use in your application's particular context and paired with additional security controls. + ```{literalinclude} ../../examples/configs/injection_detection/demo-out.txt + :start-after: "# start-unsafe-response" + :end-before: "# end-unsafe-response" + ``` diff --git a/examples/configs/injection_detection/config/config.yml b/examples/configs/injection_detection/config/config.yml new file mode 100644 index 000000000..a1e58e425 --- /dev/null +++ b/examples/configs/injection_detection/config/config.yml @@ -0,0 +1,22 @@ +models: + - type: main + engine: nvidia_ai_endpoints + model: meta/llama-3.3-70b-instruct + +rails: + config: + injection_detection: + injections: + - code + - sqli + - template + - xss + action: reject + + output: + streaming: + enabled: True + chunk_size: 200 + context_size: 50 + +streaming: True diff --git a/examples/configs/injection_detection/demo-out.txt b/examples/configs/injection_detection/demo-out.txt new file mode 100644 index 000000000..66e937aa7 --- /dev/null +++ b/examples/configs/injection_detection/demo-out.txt @@ -0,0 +1,3 @@ +# start-unsafe-response +{'role': 'assistant', 'content': '**Getting the Weather in Santa Clara using Python**\n=====================================================\n\nTo get the weather in Santa Clara, we can use the OpenWeatherMap API, which provides current and forecasted weather conditions. We will use the `requests` library to make an HTTP request to the API and the `json` library to parse the response.\n\n**Prerequisites**\n---------------\n\n* Python 3.x\n* `requests` library (`pip install requests`)\n* OpenWeatherMap API key (sign up for free at [OpenWeatherMap](https://home.openweathermap.org/users/sign_up))\n\n**Code**\n-----\n\n```python\nimport requests\nimport json\n\ndef get_weather(api_key, city, units=\'metric\'):\n """\n Get the current weather in a city.\n\n Args:\n api_key (str): OpenWeatherMap API key\n city (str): City name\n units (str, optional): Units of measurement (default: \'metric\')\n\n Returns:\n dict: Weather data\n """\n base_url = \'http://api.openweathermap.org/data/2.5/weather\'\n params = {\n \'q\': city,\n \'units\': units,\n \'appid\': api_key\n }\n response = requests.get(base_url, params=params)\n response.raise_for_status()\n return response.json()\n\ndef main():\n api_key = \'YOUR_API_KEY\' # replace with your OpenWeatherMap API key\n city = \'Santa Clara\'\n weather_data = get_weather(api_key, city)\n print(\'Weather in {}:\'.format(city))\n print(\'Temperature: {}°C\'.format(weather_data[\'main\'][\'temp\']))\n print(\'Humidity: {}%\'.format(weather_data[\'main\'][\'humidity\']))\n print(\'Conditions: {}\'.format(weather_data[\'weather\'][0][\'description\']))\n\nif __name__ == \'__main__\':\n main()\n```\n\n**Explanation**\n--------------\n\n1. We import the required libraries: `requests` for making HTTP requests and `json` for parsing the response.\n2. We define a function `get_weather` that takes the API key, city name, and units of measurement as arguments.\n3. We construct the API URL and parameters using the `base_url` and `params` dictionary.\n4. We make a GET request to the API using `requests.get` and raise an exception if the response status code is not 200 (OK).\n5. We parse the response data using `response.json()` and return the result as a dictionary.\n6. In the `main` function, we set the API key and city name, call `get_weather`, and print the weather data.\n\n**Notes**\n-------\n\n* Replace `YOUR_API_KEY` with your actual OpenWeatherMap API key.\n* You can adjust the units of measurement by passing a different value for the `units` parameter (e.g., \'imperial\' for Fahrenheit).\n* This script only retrieves the current weather conditions. For forecasted weather, you can use the [OpenWeatherMap forecast API](https://openweathermap.org/forecast5).\n\nI hope this helps! Let me know if you have any questions or need further assistance.'} +# end-unsafe-response diff --git a/examples/configs/injection_detection/demo.py b/examples/configs/injection_detection/demo.py new file mode 100644 index 000000000..1526a939d --- /dev/null +++ b/examples/configs/injection_detection/demo.py @@ -0,0 +1,57 @@ +# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# fmt: off + +import atexit +import os +import sys +from pathlib import Path + +curdir = os.getcwd() + +@atexit.register +def cleanup(): + os.chdir(curdir) + +os.chdir(Path(__file__).parent) + +# start-load-config +import os + +from nemoguardrails import LLMRails, RailsConfig + +config = RailsConfig.from_path("./config") +rails = LLMRails(config) +# end-load-config + +# start-unsafe-response +messages = [ + { + "role": "user", + "content": "Write a Python script that uses the requests library, and maybe urllib, to get the weather in Santa Clara.", + }, +] + +response = rails.generate(messages=messages) +print(response) +# end-unsafe-response + +stdout = sys.stdout +with open("demo-out.txt", "w") as sys.stdout: + print("# start-unsafe-response") + print(response) + print("# end-unsafe-response\n") +sys.stdout = stdout