ADR 011: Detector fields

This ADR documents the design and decisions for extending the detector API for various fields, to accomodate various outputs from models in the guardrails ecosystem.

This serves as an extension to ADR 003 - Detector API design. The detector API can be found at this Github page.

Motivation

Libraries like the vllm-detector-adapter provide the detector API and serve LLMs like Granite Guardian and Llama Guard as detectors easily thrugh vLLM.

As the models undergo development, more information is being provided on model output. Llama Guard will provide "unsafe" categories when input has been categorized as "unsafe" e.g. "unsafe\nS1" ref, and Granite Guardian 3.2 began to provide additional information such as confidence e.g. "No\n<confidence> High </confidence> to indicate No risk and High confidence in that decision ref.

At the time of writing, the detector API endpoints return two types of responses:

/text/contents returns a list of lists of detections with spans. Each list of detections corresponds to the respective content in contents provided in the user request, so contents with 2 texts would return a list of 2 lists.
Other endpoints return a list of detections, without spans.

Any placement of additional detector fields should account for the two types of responses.

Assumptions

This ADR considers fields that apply on each particular detection or decision made by the detector model.
Knowledge of future model plans is being restricted here, so only a few examples are given with already released model functionalities.

Decisions

We will add a new high-level field of metadata to account for additional information from detector models. This field will provide a dictionary with string keys and arbitrary values, so that values are not constrained to particular types like strings or floats. This will enable flexibility and is how APIs like Llama Stack provides additional information, whether on datasets or models.

Example

    {
      "detection_type": "animal",
      "detection": "goose",
      "score": 0.2,
      "evidence": [],
      "metadata":
        {
          "confidence": "High",
          "key": 0.3,
          "categories": ["bird"]
        }
    }

To distinguish metadata from the existing evidence field, any attributes under evidence are meant to help answer: "Why was this decision made?"
metadata will just present information, and the orchestrator will not be altering workflow directions based on any information within the metadata. The orchestrator is currently not designed to take any action or decision based on model outputs, as the API is designed to present information to the orchestrator API user or consuming application. The user or application can then decide what to do with the information, whether doing another generation call, masking text, or further presenting the info to that consuming application's users.
The updates will affect what endpoints of the detector API return, and changes will be reflected on the orchestrator API as well.
To keep the experience consistent among various detector API endpoints i.e. /text/contents vs. others, any added fields will be on the same level e.g. on the same level as detection.

Alternate Considerations

A few alternate strategies were considered with pros and cons documented.

Alternate naming for new high-level field to nest new fields

a. features

Example

    {
      "detection_type": "animal",
      "detection": "goose",
      "score": 0.2,
      "evidence": [],
      "features":
        {
          "confidence": "High",
          "key": 0.3,
          "categories": ["bird"]
        }
    }

Pros:

Slightly more descriptive than just metadata

Cons:

features might not be appropriate for all attributes
May be confusing in relation to model 'features' in relation to data
Similar to the metadata case, addition of features could create potential confusion with evidence
Similar to the metadata case, arbitrary keys and values will be difficult to validate, but implementations also do not have to validate.

b. attributes - This is a more general term than features but can be considered more restricting than metadata. Not all fields may be considered attributes of the decision.

c. controls - This concept may be too Granite specific ref. Fields like confidence are also not "controlled" or requested by the user.

Use current API with `evidence`

Currently, a list of evidence can be provided, with arbitrary string attributes as name and corresponding string value and float score, with nested evidence as necessary. value and score may not appropriate for each field or attribute case and can be optional.

Example

    {
      "detection_type": "animal",
      "detection": "goose",
      "score": 0.2,
      "evidence": [
        {
          "name": "confidence",
          "value": "High"
        },
        {
          "name": "categories",
          "value": "bird"
        }
      ]
    }

Pros:

The current API can remain the same
Generally flexible to various attributes and values that are strings, floats

Cons:

Not all fields or attributes are necessarily appropriate as evidence or explanatory toward the particular detection but may be providing more information
value is constrained to string and score is constrained to float currently. For some fields, value may be more appropriate as another data format.

Add fields at same level as current fields

Example

    {
      "detection_type": "animal",
      "detection": "goose",
      "score": 0.2,
      "evidence": [],
      "confidence": "High"
    }

Pro:

Similar to the metadata case, enables a lot of flexibility
Potentially slightly less confusion of a completely alternate field than evidence, but would still require dilineating what is not evidence

Cons:

More so than the metadata case, this will be more difficult to document in the API and allow users to expect particular fields on responses, especially if different detectors provide different fields
Arbitrary keys and values will be difficult to validate
Still raises the question of what goes under existing evidence or is put at the higher level

Consequences

Both detector API users and orchestrator API users will see additional fields reflected with detection results.
The APIs will handle additional model outputs as model versions are released.
API users will be able to parse the metadata field to receive additional model information.
Implementers of the detector API can use the metadata field to provide additional model information.

Status

Proposed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

011-detector-fields.md

011-detector-fields.md

ADR 011: Detector fields

Motivation

Assumptions

Decisions

Alternate Considerations

Alternate naming for new high-level field to nest new fields

Use current API with `evidence`

Add fields at same level as current fields

Consequences

Status

Files

011-detector-fields.md

Latest commit

History

011-detector-fields.md

File metadata and controls

ADR 011: Detector fields

Motivation

Assumptions

Decisions

Alternate Considerations

Alternate naming for new high-level field to nest new fields

Use current API with evidence

Add fields at same level as current fields

Consequences

Status

Use current API with `evidence`