Skip to content

Enable passing column type to SHAPConfig in combination with ClarifyCheckStep #5131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oskarklang-private opened this issue Apr 16, 2025 · 0 comments
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug

Comments

@oskarklang-private
Copy link

oskarklang-private commented Apr 16, 2025

Describe the feature you'd like
Add a parameter to SHAPConfig from sagemaker.workflow.clarify_checkstep which lets the user specify the types of the dataset used to create a baseline for the SHAP analysis (e.g. float, int, category, etc..).
Alternatively, make it possible to run ClarifyCheckStep when an S3 URI has been passed as baseline to SHAPConfig.

How would this feature be used? Please describe.
When using the ClarifyCheckStep and SHAPConfig from sagemaker.workflow.clarify_checkstep, I am currently unable to specify my dataset's column types (e.g. some columns should be numerical while others should be categorical).

When running the ClarifyCheckStep as part of a SageMaker pipeline, Clarify calculates a baseline which is erroneous due to not having taken the column types into account, so e.g. some columns that should be categorical gets the mean of the column as baseline, where preferrably they should get the mode of the column or something else more appropriate.

I know that I can pass my own baseline to SHAPConfig, but I don't want this hard coded in my SageMaker pipeline definition - I want it to be computed at runtime, based on previous steps in my SageMaker pipeline.
An alternative solution would be to pass to SHAPConfig the S3 URI to a baseline dataset I create in a previous step, however this doesn't seem to work with how ClarifyCheckStep is currently implemented.

Describe alternatives you've considered
Make it possible to run ClarifyCheckStep when an S3 URI has been passed as baseline to SHAPConfig.

Additional context

from sagemaker.workflow.clarify_check_step import ClarifyCheckStep, ModelExplainabilityCheckConfig, SHAPConfig

shap_config = SHAPConfig(seed=123, num_samples=100, num_clusters=5)

model_explainability_check_config = ModelExplainabilityCheckConfig(
      data_config=model_explainability_data_config,
      model_config=model_config,
      explainability_config=shap_config,
  )

step_model_explainability_check = ClarifyCheckStep(
        name="ModelExplainabilityCheckStep",
        display_name="Model Explainability Check",
        clarify_check_config=model_explainability_check_config,
        check_job_config=check_job_config_clarify,
        skip_check=skipCheckModelExplainabilityParam,
        register_new_baseline=registerNewBaselineModelExplainabilityParam,
        supplied_baseline_constraints=suppliedBaselineConstraintsModelExplainabilityParam,
        model_package_group_name=model_package_group_name,
    )
@nargokul nargokul added type: bug component: pipelines Relates to the SageMaker Pipeline Platform labels Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug
Projects
None yet
Development

No branches or pull requests

2 participants