-
Notifications
You must be signed in to change notification settings - Fork 44
Implement the MLJ model API without needing to depend on external dependencies such as CSV.jl, CategoricalArrays.jl, etc. #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This (long) discussion can help understand the context of where DataAPI originated from: https://discourse.julialang.org/t/proposal-for-sharedfunctions-jl-package-for-optional-dependency-management/23526 |
cstjean's comment in that discussion thread basically sums up what I want to do:
I want to define methods for Since there is currently no support in base Julia for this, I think that the next best approach would be to create an MLJapi.jl package with no dependencies, thus allowing me to define methods for Basically I want to be able to implement the MLJ model API without needing to depend on any non-standard library dependencies. Is there another way that this would be possible? |
As another example, https://github.com/cstjean/ScikitLearnBase.jl has no dependencies, which allows package developers to implement the ScikitLearn.jl API without needing to depend on any additional dependencies. |
@DilumAluthge Thanks for your post and the offer of help. Happy to see how we I don't think a separate MLJapi helps. MLJBase is supposed to do CSV CategoricalArrays - Lightweight and any model dealing with ColorTypes - FixedPointNumbers is the only dependency, which itself has none Distributions - Unfortunately, a large package but dependencies StatsBase - We extend the fit, predict and fit! from here. Hard Tables - Very lightweight and essential because the "X" in the We can certainly remove CSV. It is only there to provide a few load_boston() load_crabs() load_iris() load_reduced_ames() load_ames() don't think this is actually used datanow() Note that all of these except the last return the data wrapped as We could replace the above with datasets from RDatasets, which @DilumAluthge How keen would you be on making a PR refactoring the test code along these lines? |
@ablaom Instead of removing CSV entirely, how about making it an optional dependency, and loading the code in I've made the following pull requests: |
Actually I think that's a good idea. I have a slight misgiving as Mike Innes (Requires author) once told me "Requires is really a hack" and advised against using it as a permanent solution. But you have already done the work and the refactoring option could be done later. Obviously the travis tests on MLJ and MLJModels do not see the changes proposed on MLJBase. Have you tested the three updated repos work together locally? |
Yep, I just checked out the For what it's worth, I think that Requires isn't considered hacky anymore. Thanks to refactoring by Mike Innes and Tim Holy, and the addition of |
Great! So I will:
Edited: Not necessary to introduce new [compat] to MLJ and MLJModels. Will work fine with old version of MLJBase after their respective PR's are pulled. |
Uh oh!
There was an error while loading. Please reload this page.
As far as I can tell, in order for me to implement the MLJ model API, I need to import MLJBase.jl.
While MLJBase.jl is a more lightweight dependency than MLJ.jl, it still does have quite a few dependencies. I would rather not have to depend on CSV.jl, CategoricalArrays.jl, Tables.jl, etc. in order to be able to implement the MLJ model API.
Take this modified version of the simple deterministic regressor example:
I can define this entire model without using any dependencies. Unfortunately, because I need to import MLJBase.jl, I still end up depending on all of MLJBase.jl's dependencies.
The JuliaData people have solved this problem by creating the DataAPI.jl package. DataAPI.jl is a tiny package that has no dependencies and provides the namespace for the JuliaData API.
Would you be open to creating a similar MLJapi.jl package? The package would be very simple. It would have no dependencies, and its only content would consist of type definitions and function stubs, for example:
MLJ.jl, MLJBase.jl, MLJModels.jl, etc. would import MLJapi.jl and extend its functions.
This would allow other package authors to implement the MLJ model API without needing to depend on all of MLJBase's dependencies.
Would you be willing to adopt this approach? If so, I'd be more than happy to help create the MLJapi.jl package.
The text was updated successfully, but these errors were encountered: