Skip to content

docs: Revise GS example #1146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 21 additions & 21 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -76,30 +76,30 @@ The sample code uses the [Llama 3.3 70B Instruct model](https://build.nvidia.com
:end-before: "# end-generate-response"
```

## Timing and Token Information
1. Send a safe request and generate a response:

The following modification of the sample code shows the timing and token information for the guardrail.

- Generate a response and print the timing and token information:
```{literalinclude} ../examples/configs/gs_content_safety/demo.py
:language: python
:start-after: "# start-safe-response"
:end-before: "# end-safe-response"
```

```{literalinclude} ../examples/configs/gs_content_safety/demo.py
:language: python
:start-after: "# start-get-duration"
:end-before: "# end-get-duration"
```
_Example Output_

_Example Output_
```{literalinclude} ../examples/configs/gs_content_safety/demo-out.txt
:language: text
:start-after: "# start-safe-response"
:end-before: "# end-safe-response"
```

```{literalinclude} ../examples/configs/gs_content_safety/demo-out.txt
:language: text
:start-after: "# start-get-duration"
:end-before: "# end-get-duration"
```
## Next Steps

The timing and token information is available with the `print_llm_calls_summary()` method.
- Run the `content_safety_tutorial.ipynb` notebook from the
[example notebooks](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/notebooks)
directory of the GitHub repository.
The notebook compares LLM responses with and without safety checks and classifies responses
to sample prompts as _safe_ or _unsafe_.
The notebook shows how to measure the performance of the checks, focusing on how many unsafe
responses are blocked and how many safe responses are incorrectly blocked.

```{literalinclude} ../examples/configs/gs_content_safety/demo-out.txt
:language: text
:start-after: "# start-explain-info"
:end-before: "# end-explain-info"
```
- Refer to [](user-guides/configuration-guide.md) for information about the `config.yml` file.
3 changes: 2 additions & 1 deletion examples/configs/gs_content_safety/config/config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
models:
- type: main
engine: nvidia_ai_endpoints
model_name: meta/llama-3.3-70b-instruct
model: meta/llama-3.3-70b-instruct

- type: content_safety
engine: nvidia_ai_endpoints
@@ -15,6 +15,7 @@ rails:
flows:
- content safety check output $model=content_safety
streaming:
enabled: True
chunk_size: 200
context_size: 50

16 changes: 3 additions & 13 deletions examples/configs/gs_content_safety/demo-out.txt
Original file line number Diff line number Diff line change
@@ -3,16 +3,6 @@ I'm sorry, I can't respond to that.
# end-generate-response


# start-get-duration
Cape Hatteras National Seashore! It's a 72-mile stretch of undeveloped barrier islands off the coast of North Carolina, featuring pristine beaches, Cape Hatteras Lighthouse, and the Wright brothers' first flight landing site. Enjoy surfing, camping, and wildlife-spotting amidst the natural beauty and rich history.
# end-get-duration


# start-explain-info
Summary: 3 LLM call(s) took 1.50 seconds and used 22394 tokens.

1. Task `content_safety_check_input $model=content_safety` took 0.35 seconds and used 7764 tokens.
2. Task `general` took 0.67 seconds and used 164 tokens.
3. Task `content_safety_check_output $model=content_safety` took 0.48 seconds and used 14466 tokens.

# end-explain-info
# start-safe-response
Cape Hatteras National Seashore: 72 miles of pristine Outer Banks coastline in North Carolina, featuring natural beaches, lighthouses, and wildlife refuges.
# end-safe-response
23 changes: 4 additions & 19 deletions examples/configs/gs_content_safety/demo.py
Original file line number Diff line number Diff line change
@@ -58,33 +58,18 @@ async def stream_response(messages):
print("# end-generate-response\n")
sys.stdout = stdout

# start-get-duration
explain_info = None

async def stream_response(messages):
async for chunk in rails.stream_async(messages=messages):
global explain_info
if explain_info is None:
explain_info = rails.explain_info
print(chunk, end="")
print()

# start-safe-response
messages=[{
"role": "user",
"content": "Tell me about Cape Hatteras National Seashore in 50 words or less."
}]

asyncio.run(stream_response(messages))

explain_info.print_llm_calls_summary()
# end-get-duration
# end-safe-response

stdout = sys.stdout
with open("demo-out.txt", "a") as sys.stdout:
print("\n# start-get-duration")
print("\n# start-safe-response")
asyncio.run(stream_response(messages))
print("# end-get-duration\n")
print("\n# start-explain-info")
explain_info.print_llm_calls_summary()
print("# end-explain-info\n")
print("# end-safe-response\n")
sys.stdout = stdout