A robust service for implementing safety and quality controls on Large Language Model (LLM) outputs. This service provides multiple guardrails to ensure LLM responses are safe, appropriate, and meet specified quality standards.
- Multiple Guardrail Types:
- Content Safety Filters (
safety
)- Violence detection (low/medium/high)
- Hate speech detection (low/medium/high)
- Insults detection (low/medium/high)
- Misconduct detection (low/medium/high)
- Sexual content detection (low/medium/high)
- Topic Control (
topics
)- Denied topics list
- Word Filtering (
words
)- Built-in profanity detection
- Custom word list filtering
- PII Detection and Protection (
pii
)- Standard PII types (email, phone, SSN, credit card, address, name, date of birth, IP address, passport, drivers license)
- Custom regex patterns for additional entity detection
- Content Safety Filters (
GET /api/v1/guardrails
Returns a list of all available guardrails and their capabilities.
Response:
{
"guardrails": [
{
"id": "pii",
"name": "PII Detection",
"description": "Detects and handles personally identifiable information",
"supports_validation": true,
"supports_transformation": true,
"options": {
"entity_types": [
"email",
"phone",
"ssn",
"credit_card",
"address",
"name",
"date_of_birth",
"ip_address",
"passport",
"drivers_license"
],
"custom_regexes": [
{
"pattern": "regex_pattern",
"label": "custom_entity_label"
}
]
}
},
{
"id": "profanity",
"name": "Profanity Filter",
"description": "Detects and filters offensive language",
"supports_validation": true,
"supports_transformation": true,
"options": {
"severity_levels": ["mild", "moderate", "severe"]
}
}
]
}
POST /api/v1/validate
Validates an LLM response against specified guardrails and returns whether it passes or fails.
Request Body:
{
"content": "LLM response text",
"guardrails": ["pii", "profanity", "relevancy"],
"options": {
"pii": {
"entity_types": ["email", "phone", "ssn"],
"custom_regexes": [
{
"pattern": "\\b[A-Z0-9._%+-]+@example\\.com\\b",
"label": "company_email"
}
]
},
"profanity": {},
"relevancy": {}
}
}
Response:
{
"is_valid": boolean,
"failed_guardrails": ["string"],
"details": {
"guardrail_id": {
"passed": boolean,
"violations": ["string"]
}
}
}
POST /api/v1/transform
Applies specified transformations to make the content compliant with guardrails.
Request Body:
{
"content": "LLM response text",
"guardrails": ["pii", "profanity"],
"options": {
"pii": {
"entity_types": ["email", "phone", "ssn"],
"custom_regexes": [
{
"pattern": "\\b[A-Z0-9._%+-]+@example\\.com\\b",
"label": "company_email"
}
]
},
"profanity": {}
}
}
Response:
{
"transformed_content": "string",
"applied_transformations": ["string"],
"details": {
"guardrail_id": {
"details": {}
}
}
}
GET /health
Verifies the service is running and all guardrails are properly initialized.
Response:
{
"status": "healthy",
"guardrails": {
"pii": true,
"profanity": true
}
}
The project is organized into the following structure:
guardrails-service/
├── config/ # Configuration files
│ └── .env # Environment variables
├── src/ # Source code
│ ├── api/ # API endpoints and application server
│ │ └── app.py # FastAPI application
│ ├── guardrails/ # Guardrail implementations
│ │ ├── base.py # Base guardrail classes and interfaces
│ │ ├── pii.py # PII detection guardrail
│ │ ├── pii_types.py # PII entity types
│ │ └── topic.py # Topic control guardrail
│ └── models/ # ML model services
│ └── service.py # Model service implementations
├── .env # Environment variables (root copy for compatibility)
├── Dockerfile # Container definition
├── requirements.txt # Python dependencies
└── run.py # Entry point script
- Set up your environment variables
- Install dependencies
- Run the service:
python run.py
# Run in development mode
uvicorn app:app --reload
# Run tests
pytest
# Run linting
flake8
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.