diff --git a/docs/getting-started/2-core-colang-concepts/README.md b/docs/getting-started/2-core-colang-concepts/README.md
index b38bfe105..33688acdb 100644
--- a/docs/getting-started/2-core-colang-concepts/README.md
+++ b/docs/getting-started/2-core-colang-concepts/README.md
@@ -273,9 +273,11 @@ In our "Hello World" example, the predefined messages "Hello world!" and "How ar
In the previous example, the LLM is prompted once. The following figure provides a summary of the outlined sequence of steps:
-
-

-
+```{image} ../../_static/puml/core_colang_concepts_fig_1.png
+:alt: "Sequence diagram showing the three main steps of processing a user greeting: 1) Computing the canonical form of the user message, 2) Determining the next step using flows, and 3) Generating the bot's response message"
+:width: 486px
+:align: center
+```
Let's examine the same process for the follow-up question "What is the capital of France?".
@@ -321,9 +323,11 @@ Summary: 3 LLM call(s) took 1.79 seconds and used 1374 tokens.
Based on these steps, we can see that the `ask general question` canonical form is predicted for the user utterance "What is the capital of France?". Since there is no flow that matches it, the LLM is asked to predict the next step, which in this case is `bot response for general question`. Also, since there is no predefined response, the LLM is asked a third time to predict the final message.
-
-

-
+```{image} ../../_static/puml/core_colang_concepts_fig_2.png
+:alt: "Sequence diagram showing the three main steps of processing a follow-up question in NeMo Guardrails: 1) Computing the canonical form of the user message, such as 'ask general question' for 'What is the capital of France?', 2) Determining the next step using the LLM, such as 'bot response for general question', and 3) Generating the bot's response message. These are the steps to handle a question that doesn't have a predefined flow."
+:width: 586px
+:align: center
+```
## Wrapping up
diff --git a/docs/getting-started/4-input-rails/README.md b/docs/getting-started/4-input-rails/README.md
index e59812202..7d97d3fed 100644
--- a/docs/getting-started/4-input-rails/README.md
+++ b/docs/getting-started/4-input-rails/README.md
@@ -283,9 +283,11 @@ print(info.llm_calls[0].completion)
The following figure depicts in more details how the self-check input rail works:
-
-

-
+```{image} ../../_static/puml/input_rails_fig_1.png
+:alt: "Sequence diagram showing how the self-check input rail works in NeMo Guardrails: 1) Application code sends a user message to the Programmable Guardrails system, 2) The message is passed to the Input Rails component, 3) Input Rails calls the self_check_input action, 4) The action uses an LLM to evaluate the message, 5) If the LLM returns 'Yes' indicating inappropriate content, the input is blocked and the bot responds with 'I am not able to respond to this.'"
+:width: 815px
+:align: center
+```
The `self check input` rail calls the `self_check_input` action, which in turn calls the LLM using the `self_check_input` task prompt.
@@ -327,9 +329,11 @@ print(info.llm_calls[0].completion)
Because the input rail was not triggered, the flow continued as usual.
-
-

-
+```{image} ../../_static/puml/input_rails_fig_2.png
+:alt: "Sequence diagram showing how the self-check input rail works in NeMo Guardrails when processing a valid user message: 1) Application code sends a user message to the Programmable Guardrails system, 2) The message is passed to the Input Rails component, 3) Input Rails calls the self_check_input action, 4) The action uses an LLM to evaluate the message, 5) If the LLM returns 'No' (indicating appropriate content), the input is allowed to continue, 6) The system then proceeds to generate a bot response using the general task prompt"
+:width: 740px
+:align: center
+```
Note that the final answer is not correct.
diff --git a/docs/user-guides/guardrails-process.md b/docs/user-guides/guardrails-process.md
index a9a279d60..226c0cf3e 100644
--- a/docs/user-guides/guardrails-process.md
+++ b/docs/user-guides/guardrails-process.md
@@ -6,9 +6,10 @@ This guide provides an overview of the main types of rails supported in NeMo Gua
NeMo Guardrails has support for five main categories of rails: input, dialog, output, retrieval, and execution. The diagram below provides an overview of the high-level flow through these categories of flows.
-
-
-
+```{image} ../_static/images/programmable_guardrails_flow.png
+:alt: "High-level flow through the five main categories of guardrails in NeMo Guardrails: input rails for validating user input, dialog rails for controlling conversation flow, output rails for validating bot responses, retrieval rails for handling retrieved information, and execution rails for managing custom actions."
+:align: center
+```
## Categories of Rails
@@ -28,9 +29,11 @@ There are five types of rails supported in NeMo Guardrails:
The diagram below depicts the guardrails process in detail:
-
-

-
+```{image} ../_static/puml/master_rails_flow.png
+:alt: "Sequence diagram showing the complete guardrails process in NeMo Guardrails: 1) Input Validation stage where user messages are processed by input rails that can use actions and LLM to validate or alter input, 2) Dialog stage where messages are processed by dialog rails that can interact with a knowledge base, use retrieval rails to filter retrieved information, and use execution rails to perform custom actions, 3) Output Validation stage where bot responses are processed by output rails that can use actions and LLM to validate or alter output. The diagram shows all optional components and their interactions, including knowledge base queries, custom actions, and LLM calls at various stages."
+:width: 720px
+:align: center
+```
The guardrails process has multiple stages that a user message goes through:
@@ -38,14 +41,15 @@ The guardrails process has multiple stages that a user message goes through:
2. **Dialog stage**: If the input is allowed and the configuration contains dialog rails (i.e., at least one user message is defined), then the user message is processed by the dialog flows. This will ultimately result in a bot message.
3. **Output Validation stage**: After a bot message is generated by the dialog rails, it is processed by the output rails. The Output rails decide if the output is allowed, whether it should be altered, or rejected.
-
## The Dialog Rails Flow
The diagram below depicts the dialog rails flow in detail:
-
-
-
+```{image} ../_static/puml/dialog_rails_flow.png
+:alt: "Sequence diagram showing the detailed dialog rails flow in NeMo Guardrails: 1) User Intent Generation stage where the system first searches for similar canonical form examples in a vector database, then either uses the closest match if embeddings_only is enabled, or asks the LLM to generate the user's intent. 2) Next Step Prediction stage where the system either uses a matching flow if one exists, or searches for similar flow examples and asks the LLM to generate the next step. 3) Bot Message Generation stage where the system either uses a predefined message if one exists, or searches for similar bot message examples and asks the LLM to generate an appropriate response. The diagram shows all the interactions between the application code, LLM Rails system, vector database, and LLM, with clear branching paths based on configuration options and available predefined content."
+:width: 500px
+:align: center
+```
The dialog rails flow has multiple stages that a user message goes through:
@@ -59,6 +63,8 @@ The dialog rails flow has multiple stages that a user message goes through:
When the `single_llm_call.enabled` is set to `True`, the dialog rails flow will be simplified to a single LLM call that predicts all the steps at once. The diagram below depicts the simplified dialog rails flow:
-
-
-
+```{image} ../_static/puml/single_llm_call_flow.png
+:alt: "Sequence diagram showing the simplified dialog rails flow in NeMo Guardrails when single LLM call is enabled: 1) The system first searches for similar examples in the vector database for canonical forms, flows, and bot messages. 2) A single LLM call is made using the generate_intent_steps_message task prompt to predict the user's canonical form, next step, and bot message all at once. 3) The system then either uses the next step from a matching flow if one exists, or uses the LLM-generated next step. 4) Finally, the system either uses a predefined bot message if available, uses the LLM-generated message if the next step came from the LLM, or makes one additional LLM call to generate the bot message. This simplified flow reduces the number of LLM calls needed to process a user message."
+:width: 600px
+:align: center
+```
diff --git a/docs/versions1.json b/docs/versions1.json
index 0f6f37abe..66a80e256 100644
--- a/docs/versions1.json
+++ b/docs/versions1.json
@@ -2,10 +2,10 @@
{
"preferred": true,
"version": "0.13.0",
- "url": "../0.13.0"
+ "url": "../0.13.0/"
},
{
"version": "0.12.0",
- "url": "../0.12.0"
+ "url": "../0.12.0/"
}
]