NVIDIA · mikemckiernan · Apr 25, 2025 · May 14, 2025
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -45,7 +45,7 @@ The sample code uses the [Llama 3.3 70B Instruct model](https://build.nvidia.com
    The `models` key in the `config.yml` file configures the LLM model.
    For more information about the key, refer to [](./user-guides/configuration-guide.md#the-llm-model).
 
-1. Create a prompts file, such as `config/prompts.yml`, ([download](../examples/configs/gs_content_safety/prompts.yml)), with contents like the following partial example:
+1. Create a prompts file, such as `config/prompts.yml`, ([download](path:../examples/configs/gs_content_safety/config/prompts.yml)), with contents like the following partial example:
 
    ```{literalinclude} ../examples/configs/gs_content_safety/config/prompts.yml
    :language: yaml
@@ -76,30 +76,30 @@ The sample code uses the [Llama 3.3 70B Instruct model](https://build.nvidia.com
    :end-before: "# end-generate-response"
    ```
 
-## Timing and Token Information
+1. Send a safe request and generate a response:
 
-The following modification of the sample code shows the timing and token information for the guardrail.
-
-- Generate a response and print the timing and token information:
+   ```{literalinclude} ../examples/configs/gs_content_safety/demo.py
+   :language: python
+   :start-after: "# start-safe-response"
+   :end-before: "# end-safe-response"
+   ```
 
-  ```{literalinclude} ../examples/configs/gs_content_safety/demo.py
-  :language: python
-  :start-after: "# start-get-duration"
-  :end-before: "# end-get-duration"
-  ```
+   _Example Output_
 
-  _Example Output_
+   ```{literalinclude} ../examples/configs/gs_content_safety/demo-out.txt
+   :language: text
+   :start-after: "# start-safe-response"
+   :end-before: "# end-safe-response"
+   ```
 
-  ```{literalinclude} ../examples/configs/gs_content_safety/demo-out.txt
-  :language: text
-  :start-after: "# start-get-duration"
-  :end-before: "# end-get-duration"
-  ```
+## Next Steps
 
-  The timing and token information is available with the `print_llm_calls_summary()` method.
+- Run the `content_safety_tutorial.ipynb` notebook from the
+  [example notebooks](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/notebooks)
+  directory of the GitHub repository.
+  The notebook compares LLM responses with and without safety checks and classifies responses
+  to sample prompts as _safe_ or _unsafe_.
+  The notebook shows how to measure the performance of the checks, focusing on how many unsafe
+  responses are blocked and how many safe responses are incorrectly blocked.
 
-  ```{literalinclude} ../examples/configs/gs_content_safety/demo-out.txt
-  :language: text
-  :start-after: "# start-explain-info"
-  :end-before: "# end-explain-info"
-  ```
+- Refer to [](user-guides/configuration-guide.md) for information about the `config.yml` file.
diff --git a/examples/configs/gs_content_safety/config/config.yml b/examples/configs/gs_content_safety/config/config.yml
@@ -1,7 +1,7 @@
 models:
   - type: main
     engine: nvidia_ai_endpoints
-    model_name: meta/llama-3.3-70b-instruct
+    model: meta/llama-3.3-70b-instruct
 
   - type: content_safety
     engine: nvidia_ai_endpoints
@@ -15,6 +15,7 @@ rails:
     flows:
       - content safety check output $model=content_safety
     streaming:
+      enabled: True
       chunk_size: 200
       context_size: 50
 

diff --git a/examples/configs/gs_content_safety/demo-out.txt b/examples/configs/gs_content_safety/demo-out.txt
@@ -3,16 +3,6 @@ I'm sorry, I can't respond to that.
 # end-generate-response
 
 
-# start-get-duration
-Cape Hatteras National Seashore! It's a 72-mile stretch of undeveloped barrier islands off the coast of North Carolina, featuring pristine beaches, Cape Hatteras Lighthouse, and the Wright brothers' first flight landing site. Enjoy surfing, camping, and wildlife-spotting amidst the natural beauty and rich history.
-# end-get-duration
-
-
-# start-explain-info
-Summary: 3 LLM call(s) took 1.50 seconds and used 22394 tokens.
-
-1. Task `content_safety_check_input $model=content_safety` took 0.35 seconds and used 7764 tokens.
-2. Task `general` took 0.67 seconds and used 164 tokens.
-3. Task `content_safety_check_output $model=content_safety` took 0.48 seconds and used 14466 tokens.
-
-# end-explain-info
+# start-safe-response
+Cape Hatteras National Seashore: 72 miles of pristine Outer Banks coastline in North Carolina, featuring natural beaches, lighthouses, and wildlife refuges.
+# end-safe-response
diff --git a/examples/configs/gs_content_safety/demo.py b/examples/configs/gs_content_safety/demo.py
@@ -58,33 +58,18 @@ async def stream_response(messages):
     print("# end-generate-response\n")
 sys.stdout = stdout
 
-# start-get-duration
-explain_info = None
-
-async def stream_response(messages):
-    async for chunk in rails.stream_async(messages=messages):
-        global explain_info
-        if explain_info is None:
-            explain_info = rails.explain_info
-        print(chunk, end="")
-    print()
-
+# start-safe-response
 messages=[{
     "role": "user",
     "content": "Tell me about Cape Hatteras National Seashore in 50 words or less."
 }]
 
 asyncio.run(stream_response(messages))
-
-explain_info.print_llm_calls_summary()
-# end-get-duration
+# end-safe-response
 
 stdout = sys.stdout
 with open("demo-out.txt", "a") as sys.stdout:
-    print("\n# start-get-duration")
+    print("\n# start-safe-response")
     asyncio.run(stream_response(messages))
-    print("# end-get-duration\n")
-    print("\n# start-explain-info")
-    explain_info.print_llm_calls_summary()
-    print("# end-explain-info\n")
+    print("# end-safe-response\n")
 sys.stdout = stdout