Skip to content

Fix missing and fragile scikit-learn imports in Keras sklearn wrappers #21387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 18, 2025

Conversation

timovdk
Copy link
Contributor

@timovdk timovdk commented Jun 16, 2025

This PR addresses a bug where the SKLearnClassifier and SKLearnRegressor wrappers raise an AttributeError when used without other scikit-learn utilities (e.g. make_classification).

Fixes

  • Explicitly imports sklearn.utils.multiclass.type_of_target instead of relying on indirect access via sklearn.utils
  • Cleans up related imports in related files:
    • Replaces several indirect accesses with explicit imports (make_pipeline, OneHotEncoder, MetadataRequest, check_is_fitted)
    • Avoids the use of private API sklearn.utils._array_api by using np.squeeze directly, which is consistent with the docstring expectations

Related Issue

Fixes #21386

@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.67%. Comparing base (764ed95) to head (4df855e).
Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #21387   +/-   ##
=======================================
  Coverage   82.67%   82.67%           
=======================================
  Files         565      565           
  Lines       55064    55068    +4     
  Branches     8569     8569           
=======================================
+ Hits        45525    45529    +4     
  Misses       7441     7441           
  Partials     2098     2098           
Flag Coverage Δ
keras 82.48% <100.00%> (+<0.01%) ⬆️
keras-jax 63.51% <100.00%> (+<0.01%) ⬆️
keras-numpy 58.68% <66.66%> (+<0.01%) ⬆️
keras-openvino 33.46% <66.66%> (+<0.01%) ⬆️
keras-tensorflow 63.91% <100.00%> (+<0.01%) ⬆️
keras-torch 63.54% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

where the SKLearnClassifier and SKLearnRegressor wrappers raise an AttributeError when used without other scikit-learn utilities (e.g. make_classification).

Why does this happen exactly?

It's generally much cleaner to only try to import sklearn.

@timovdk
Copy link
Contributor Author

timovdk commented Jun 17, 2025

Thanks for the PR.

where the SKLearnClassifier and SKLearnRegressor wrappers raise an AttributeError when used without other scikit-learn utilities (e.g. make_classification).

Why does this happen exactly?

It's generally much cleaner to only try to import sklearn.

Thanks for the review!

The issue is that import sklearn alone does not load submodules like sklearn.utils.multiclass. So when code later accesses sklearn.utils.multiclass.type_of_target it can raise an AttributeError because utils or multiclass hasn't been imported yet. This doesn't show up in the docstring examples, since those import make_classification, which indirectly imports the multiclass (and other .utils) submodules.

Switching to directly importing type_of_target ensures that missing dependencies raise an ImportError at import time. However, to raise the ImportError consistently, I agree that import sklearn should be re-added to these files, thanks!

@fchollet
Copy link
Collaborator

The issue is that import sklearn alone does not load submodules like sklearn.utils.multiclass. So when code later accesses sklearn.utils.multiclass.type_of_target it can raise an AttributeError because utils or multiclass hasn't been imported yet

This is an issue with the design of sklearn; if the package was well-designed then you could access any member of the API from sklearn.xyz... (as is the case for Keras).

To route around it I would suggest leaving the top-of-file import structure unchanged, and then doing e.g. from sklearn.utils.validation import check_is_fitted right before you need it, in-line.

@timovdk
Copy link
Contributor Author

timovdk commented Jun 17, 2025

This is an issue with the design of sklearn; if the package was well-designed then you could access any member of the API from sklearn.xyz... (as is the case for Keras).

Yes I agree, and this is only the case for multiclass, which is not exposed in sklearn/utils/__init__.py. That's why type_of_target cannot be accessed through sklearn.utils.multiclass.type_of_target.

After fixing that import, I adjusted the other sklearn imports for consistency by using the same from sklearn.xyz import foo style and grouping them at the top of the file for readability.

But, if the preference is to use inline imports, I propose to revert all non-essential import changes and leave only the inline import for type_of_target, because I think that the readability argument no longer applies with inline imports.

(Edit): I also found that check_is_fitted is not exposed through sklearn/utils/__init__.py, so it too would require an inline import. The others (metadata_routing, preprocessing, and pipeline) are exposed through their respective __init__.py files, so I can safely revert those changes.

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the update!

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Jun 18, 2025
@fchollet fchollet merged commit e99164e into keras-team:master Jun 18, 2025
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to pull Ready to be merged into the codebase size:S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError in SKLearnClassifier and SKLearnRegressor wrappers due to missing sklearn.utils.multiclass
5 participants