Skip to content

LLM - Custom Step Generator & Document Studio Flows using Azure OpenAI #204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions LLM - Custom Step Generator/.env.sample.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
AZURE_OAI_ENDPOINT='https://my_endpoint.openai.azure.com/' # change my_prefix
AZURE_OAI_KEY='my_api_key' # change my_key
AZURE_OAI_DEPLOYMENT='gpt-4o'
15 changes: 15 additions & 0 deletions LLM - Custom Step Generator/LLM - Custom Step Generator.step
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"type": "code",
"name": "LLM - Custom Step Generator.step",
"displayName": "LLM - Custom Step Generator.step",
"description": "",
"templates": {
"SAS": "/* SAS templated code goes here */\n/* Run the Python code within PROC PYTHON */\nproc python;\n submit;\n\n# Import required libraries\nimport os\nfrom dotenv import load_dotenv\nimport requests\n\n# Get variables from SAS\nstep_description = SAS.symget('step_description')\nenv_file_folder = SAS.symget('env_file_folder')\noutput_file = SAS.symget('output_file')\nmessages = SAS.symget('messages')\n\n# Remove prefixes like 'sasserver:' from the output file path\nenv_file_folder = env_file_folder.replace('sasserver:', '')\nmessages = messages.replace('sasserver:', '')\noutput_file = output_file.replace('sasserver:', '')\n\n# Change to the directory where the .env file is stored\n# Replace this with the correct path if needed\nos.chdir(env_file_folder)\n\n# Load Azure OpenAI credentials from the .env file\nload_dotenv()\nazure_oai_endpoint = os.getenv(\"AZURE_OAI_ENDPOINT\")\nazure_oai_key = os.getenv(\"AZURE_OAI_KEY\")\nazure_oai_deployment = os.getenv(\"AZURE_OAI_DEPLOYMENT\")\nazure_oai_model = azure_oai_deployment\napi_version = '2024-05-01-preview'\n\n# Read LLM system message\nwith open(messages, 'r', encoding=\"utf8\") as file:\n system_message = file.read()\n\n# Define the payload for the OpenAI API\npayload = {\n \"messages\": [\n {\"role\": \"system\", \"content\": system_message},\n {\"role\": \"user\", \"content\": step_description}\n ],\n \"temperature\": 0.5,\n \"top_p\": 0.9,\n \"max_tokens\": 3500\n}\n\n# Set the API endpoint\nENDPOINT = f\"{azure_oai_endpoint}openai/deployments/{azure_oai_model}/chat/completions?api-version={api_version}\"\n\n# Define headers for the request\nheaders = {\n \"Content-Type\": \"application/json\",\n \"api-key\": azure_oai_key\n}\n\n# Send the request to Azure OpenAI\ntry:\n response = requests.post(ENDPOINT, headers=headers, json=payload)\n response.raise_for_status()\n response_data = response.json()\n\n # Extract the generated step file content\n step_file_content = response_data['choices'][0]['message']['content']\n\n # Write the content to the output file\n with open(output_file, 'w', encoding='utf-8') as f:\n f.write(step_file_content)\n\n print(f\"Custom step file successfully written to: {output_file}\")\n\nexcept requests.RequestException as e:\n print(f\"Error communicating with Azure OpenAI: {e}\")\nexcept Exception as e:\n print(f\"An error occurred: {e}\")\n SAS.submit(f'data _null_; put \"{error_message}\"; run;')\n\nendsubmit;\nrun;"
},
"properties": {},
"ui": "{\n\t\"showPageContentOnly\": true,\n\t\"pages\": [\n\t\t{\n\t\t\t\"id\": \"pageOptions\",\n\t\t\t\"type\": \"page\",\n\t\t\t\"label\": \"Options\",\n\t\t\t\"children\": [\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"text1\",\n\t\t\t\t\t\"type\": \"text\",\n\t\t\t\t\t\"text\": \"Generate a SAS Studio custom step with Azure OpenAI.\",\n\t\t\t\t\t\"visible\": \"\"\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"step_description\",\n\t\t\t\t\t\"type\": \"textarea\",\n\t\t\t\t\t\"label\": \"Describe the custom step you want to generate:\",\n\t\t\t\t\t\"placeholder\": \"Enter a detailed description of the custom step, including inputs, outputs, logic, and any specific requirements.\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"env_file_folder\",\n\t\t\t\t\t\"type\": \"path\",\n\t\t\t\t\t\"label\": \"Folder where the .env file is stored:\",\n\t\t\t\t\t\"pathtype\": \"folder\",\n\t\t\t\t\t\"placeholder\": \"/azuredm/code\",\n\t\t\t\t\t\"required\": true,\n\t\t\t\t\t\"visible\": \"\"\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"messages\",\n\t\t\t\t\t\"type\": \"path\",\n\t\t\t\t\t\"label\": \"Specify the LLM system message file:\",\n\t\t\t\t\t\"pathtype\": \"file\",\n\t\t\t\t\t\"placeholder\": \"cs2_system_message.txt\",\n\t\t\t\t\t\"required\": true,\n\t\t\t\t\t\"visible\": \"\"\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"output_file\",\n\t\t\t\t\t\"type\": \"path\",\n\t\t\t\t\t\"label\": \"Specify the output .step file:\",\n\t\t\t\t\t\"pathtype\": \"file\",\n\t\t\t\t\t\"placeholder\": \"gen_.step\",\n\t\t\t\t\t\"required\": true\n\t\t\t\t}\n\t\t\t]\n\t\t},\n\t\t{\n\t\t\t\"id\": \"pageAbout\",\n\t\t\t\"type\": \"page\",\n\t\t\t\"label\": \"About\",\n\t\t\t\"children\": [\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"text2\",\n\t\t\t\t\t\"type\": \"text\",\n\t\t\t\t\t\"text\": \"LLM Custom Step Generator\\n====================\\nThis custom step uses Azure OpenAI to generate a complete SAS Studio custom step file (.step).\",\n\t\t\t\t\t\"visible\": \"\"\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"prerequisites\",\n\t\t\t\t\t\"type\": \"section\",\n\t\t\t\t\t\"label\": \"Pre-requisites\",\n\t\t\t\t\t\"open\": true,\n\t\t\t\t\t\"visible\": \"\",\n\t\t\t\t\t\"children\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"text3\",\n\t\t\t\t\t\"type\": \"text\",\n\t\t\t\t\t\"text\": \"Prepare the Environment:\\n - Ensure you have access to an Azure OpenAI resource and the necessary `.env` file for configuration.\\n - Install required Python dependencies (`python-dotenv`, `requests`).\",\n\t\t\t\t\t\"visible\": \"\"\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"documentation\",\n\t\t\t\t\t\"type\": \"section\",\n\t\t\t\t\t\"label\": \"Documentation\",\n\t\t\t\t\t\"open\": true,\n\t\t\t\t\t\"visible\": \"\",\n\t\t\t\t\t\"children\": []\n\t\t\t\t},\n\t\t\t\t{\n\t\t\t\t\t\"id\": \"helpText\",\n\t\t\t\t\t\"type\": \"text\",\n\t\t\t\t\t\"text\": \"Define the Prompt:\\n - Write a detailed description of the custom step logic, including:\\n - Inputs (e.g., data files, table names).\\n - Outputs (e.g., result files, tables).\\n - The specific functionality or logic to be implemented (e.g., anonymization, merging, summarization).\\n - Example Prompt:\\n \\\"Create a custom step that reads an input CSV file, anonymizes personal data, and outputs the result to another CSV file. Provide the Prompt UI, the Python program, and the full `.step` file.\\\"\\n\\nSpecify the folder where the .env file is stored.\\n\\nSpecify the LLM system message file.\\n\\nSpecify the output file path where the generated .step file will be saved.\\n\\nRun the step to generate the .step file.\\n\\nUse the generated .step file in your SAS Studio environment.\"\n\t\t\t\t}\n\t\t\t]\n\t\t}\n\t],\n\t\"syntaxversion\": \"1.3.0\",\n\t\"values\": {\n\t\t\"step_description\": \"\",\n\t\t\"messages\": \"\",\n\t\t\"output_file\": \"\"\n\t}\n}",
"flowMetadata": {
"inputPorts": [],
"outputPorts": []
}
}
161 changes: 161 additions & 0 deletions LLM - Custom Step Generator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# LLM - Custom Step Generator

## Description

The LLM Custom Step Generator is a tool that leverages Azure OpenAI’s GPT-4o to automatically create fully functional custom steps for SAS Studio flows. By providing a detailed prompt describing the desired logic, along with configuration files for accessing the Azure OpenAI API, the generator produces a tailored custom step file.

This file includes the necessary code (in Python or SAS), input/output configurations, and a user interface for integration into flows. The generator simplifies and accelerates the creation of custom steps, enabling users to automate complex tasks like data anonymization, table merging, or advanced data summarization with minimal effort.

## Video and Blog Post

Watch the video and read the post [LLM Custom Step Generator in SAS Studios](https://communities.sas.com/t5/SAS-Communities-Library/LLM-Custom-Step-Generator-in-SAS-Studio/ta-p/961986) to find out more.


## Pre-requisites

To successfully use the LLM Custom Step Generator, ensure the following prerequisites are in place:

### 1. Azure OpenAI Resource

- **Azure Subscription**: An active Azure subscription is required to create and manage OpenAI resources.
- **Deployed GPT-4o Model**: Set up an Azure OpenAI resource and deploy a GPT-4o model. Note the following details for integration:
- Endpoint URL
- API Key
- Deployment Name

## 2. Environment Configuration (.env File)

- Create a `.env` file to store environment variables needed for API access. This file should include:

```plaintext
AZURE_OAI_ENDPOINT='https://my_endpoint.openai.azure.com/'
AZURE_OAI_KEY='your_api_key'
AZURE_OAI_DEPLOYMENT='gpt-4'
```

You can find an example in [.env.sample.txt](/LLM%20-%20Custom%20Step%20Generator/.env.sample.txt).

## 3. System Message File

- A system message file provides context to the LLM for generating custom steps. It should include:
- A description of what a custom step is.
- Examples of custom step logic written in Python and SAS.
- Guidelines for structuring the output (e.g., prompt UI, program, and `.step` file).
- This file acts as an "instruction manual" for the LLM, ensuring accurate and relevant output.

You can find an example in [cs2_system_message_new.md](/LLM%20-%20Custom%20Step%20Generator/cs2_system_message_new.md).

## 4. Python Dependencies

- Install the following Python libraries in the Python instance accessed in SAS Studio, to enable interaction with the Azure OpenAI API:
- `python-dotenv`: To load environment variables from the `.env` file.
- `requests`: To send API requests to the Azure OpenAI service.

## 5. Output Location

- Specify a directory or file path where the generated custom step code will be saved. Ensure the location has appropriate write permissions.

## 6. SAS Studio Environment

- A working SAS Studio environment is required to upload and test the generated custom step file (`.step`). The custom step was tested with SAS Viya LTS 2024.09.

## 7. Detailed Prompt

- Provide a clear and comprehensive description of the custom step logic, including:
- **Inputs**: Specify the data or files the custom step will use.
- **Outputs**: Define the expected output, such as a file or data set.
- **Desired Functionality**: Clearly describe the logic or process the custom step should perform.
- This prompt serves as the key input for the LLM to generate the custom step.

---

## User Interface

* ### Options tab ###

![Options](img/LLM%20-%20Custom%20Step%20Generator%20-%20Options.png)

* ### About tab ###

![About](img/LLM%20-%20Custom%20Step%20Generator%20-%20About.png)

## Requirements

Tested on Viya version Stable 2024.09.

## Usage

The **LLM Custom Step Generator** is a tool that leverages Azure OpenAI's GPT-4o to create custom steps for SAS Studio workflows. These custom steps can automate tasks such as data processing, transformation, or documentation generation.

This tab provides instructions on how to use the generator, along with details about the expected inputs and outputs.

---

Steps:

1. **Prepare the Environment**:
- Ensure you have access to an Azure OpenAI resource and the necessary `.env` file for configuration.
- Install required Python dependencies (`python-dotenv`, `requests`).

2. **Define the Prompt**:
- Write a detailed description of the custom step logic, including:
- Inputs (e.g., data files, table names).
- Outputs (e.g., result files, tables).
- The specific functionality or logic to be implemented (e.g., anonymization, merging, summarization).
- Example Prompt:
*"Create a custom step that reads an input CSV file, anonymizes personal data, and outputs the result to another CSV file. Provide the Prompt UI, the Python program, and the full `.step` file."*
- Example Prompt:
*"Create a custom step using SAS logic.
The step has two table inputs, for example SASDM.PRDSAL2 and SASDM.PRDSAL3.
The logic will merge the two tables. Then it will summarize the product sales by YEAR, MONTH, PRODUCT and sum up the ACTUAL sales. It will then create another data set NATIONAL_SALES in SASDM listing by YEAR, MONTH create a new column CHAMPION_PRODUCT equal with the top selling product."*

3. **Run the Generator**:
- Execute the generator with your prompt and configuration files.
- The generator will produce the custom step code in approximately 15–30 seconds.

4. **Save the Output**:
- Save the generated code as a `.step` file.
- Upload the `.step` file to your SAS Studio environment.

5. **Test the Custom Step**:
- Add the custom step to a workflow.
- Configure the step by selecting the appropriate inputs and outputs.
- Run the workflow to verify the results.

### Usage Example

Custom Step options filled:

![](img/LLM%20-%20Custom%20Step%20Generator%20-%20Python%20example.png)

---

## Outputs

- **Generated Custom Step File**:
- A `.step` file containing:
- The program logic (e.g., Python or SAS code).
- The user interface (Prompt UI) for configuring inputs and outputs.
- **Expected Results**:
- A functional custom step that can be integrated into SAS Studio flows to perform the specified task.

---

## Notes

- Ensure that the `.step` file is tested in a controlled environment before deploying it to production workflows.
- For troubleshooting, review the generated code and the input prompt for any inaccuracies or missing details.
- The quality of the output depends on the clarity and specificity of the provided prompt.

---

## Change Log

* Version 1.0 (14FEB2025)
* Initial version

<!-- DCO Remediation Commit for Bogdan Teleuca <[email protected]>

I, Bogdan Teleuca <[email protected]>, hereby add my Signed-off-by to this commit: eccafa3b97a067447bb8ba9d2935d444a99a1c0d

Signed-off-by: Bogdan Teleuca <[email protected]> -->
Loading