You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the feature you'd like
Add a parameter to SHAPConfig from sagemaker.workflow.clarify_checkstep which lets the user specify the types of the dataset used to create a baseline for the SHAP analysis (e.g. float, int, category, etc..).
Alternatively, make it possible to run ClarifyCheckStep when an S3 URI has been passed as baseline to SHAPConfig.
How would this feature be used? Please describe.
When using the ClarifyCheckStep and SHAPConfig from sagemaker.workflow.clarify_checkstep, I am currently unable to specify my dataset's column types (e.g. some columns should be numerical while others should be categorical).
When running the ClarifyCheckStep as part of a SageMaker pipeline, Clarify calculates a baseline which is erroneous due to not having taken the column types into account, so e.g. some columns that should be categorical gets the mean of the column as baseline, where preferrably they should get the mode of the column or something else more appropriate.
I know that I can pass my own baseline to SHAPConfig, but I don't want this hard coded in my SageMaker pipeline definition - I want it to be computed at runtime, based on previous steps in my SageMaker pipeline.
An alternative solution would be to pass to SHAPConfig the S3 URI to a baseline dataset I create in a previous step, however this doesn't seem to work with how ClarifyCheckStep is currently implemented.
Describe alternatives you've considered
Make it possible to run ClarifyCheckStep when an S3 URI has been passed as baseline to SHAPConfig.
Describe the feature you'd like
Add a parameter to SHAPConfig from sagemaker.workflow.clarify_checkstep which lets the user specify the types of the dataset used to create a baseline for the SHAP analysis (e.g. float, int, category, etc..).
Alternatively, make it possible to run ClarifyCheckStep when an S3 URI has been passed as baseline to SHAPConfig.
How would this feature be used? Please describe.
When using the ClarifyCheckStep and SHAPConfig from sagemaker.workflow.clarify_checkstep, I am currently unable to specify my dataset's column types (e.g. some columns should be numerical while others should be categorical).
When running the ClarifyCheckStep as part of a SageMaker pipeline, Clarify calculates a baseline which is erroneous due to not having taken the column types into account, so e.g. some columns that should be categorical gets the mean of the column as baseline, where preferrably they should get the mode of the column or something else more appropriate.
I know that I can pass my own baseline to SHAPConfig, but I don't want this hard coded in my SageMaker pipeline definition - I want it to be computed at runtime, based on previous steps in my SageMaker pipeline.
An alternative solution would be to pass to SHAPConfig the S3 URI to a baseline dataset I create in a previous step, however this doesn't seem to work with how ClarifyCheckStep is currently implemented.
Describe alternatives you've considered
Make it possible to run ClarifyCheckStep when an S3 URI has been passed as baseline to SHAPConfig.
Additional context
The text was updated successfully, but these errors were encountered: