-
Notifications
You must be signed in to change notification settings - Fork 33
Get tests pass again, fix handling of string columns #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #13 +/- ##
==========================================
- Coverage 93.82% 93.11% -0.72%
==========================================
Files 5 5
Lines 324 334 +10
==========================================
+ Hits 304 311 +7
- Misses 20 23 +3
Continue to review full report at Codecov.
|
src/modelframe.jl
Outdated
@@ -102,34 +104,48 @@ end | |||
|
|||
const DEFAULT_CONTRASTS = DummyCoding | |||
|
|||
_levels(x::Union{CategoricalArray, NullableCategoricalArray}) = levels(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: need to change this to unique
, cf. JuliaData/DataFrames.jl#1147 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that was quite easy. What is going to be harder is to handle DataArrays the same way, since unique(::PooledDataArray)
does not return values in the order specified by levels. Since unique
does not guarantee any particular order for levels, I suggest we change DataArrays to follow the ordering of levels (just like CategoricalArray
). That way, we don't need any common API other than unique
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README.md
Outdated
@@ -9,4 +9,4 @@ in Julia. | |||
|
|||
**Documentation:** [](https://juliastats.github.io/StatsModels.jl/latest) | |||
|
|||
Originally part of [DataFrames.jl](http://github.com/JuliaStats/DataFrames.jl). | |||
Originally part of [DataTables.jl](http://github.com/JuliaStats/DataTables.jl). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the old one is a bit more accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically both are true, but indeed that's not the most informative choice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth expanding on this a little in the readme; something along the lines of "this was once part of dataframes, now it depends on datatables, but eventually will support generic tabular data stores including both dataframes and datatables"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just to avoid confusion when people come here looking for help/information)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I would wait until things have stabilized a bit more, and then explain how the package is supposed to be used. Anyway we need generic interfaces very soon to support DataFrames.
5a3550d
to
0c2536d
Compare
Julia 0.6 failure is fixed by JuliaLang/julia#20665. |
All non-real columns are now considered as categorical, since conversion to a float column will likely fail. Types which should be converted to float will have to define is_categorical(::T) = false. Before this, Vector{String} and NullableVector{String} columns triggered an error. Replace ContrastsMatrix(c::ContrastsMatrix, x::CategoricalArray) with ContrastsMatrix(c::ContrastsMatrix, levels::AbstractVector), offering equivalent functionality: it's more consistent with the ContrastsMatrix() constructor, and more general since only the levels are actually needed.
The code is currently written to work with Nullable.
0c2536d
to
739d478
Compare
I've squashed some commits and removed the hacks now that DataTables no longer export conflicting function definitions. So I'd say it's good to go. |
Move to DataTables temporarily since the code is written to work with Nullable for now. This allows porting JuliaData/DataFrames.jl@781bbbd to StatsModels (first commit), which is a first step towards supporting any kind of categorical input (i.e. both
DataArray
andCategoricalArray
).If this works, we could remove the conflicting API from DataTables so that we no longer need the
using
hacks. Then we could start relying only on the AbstractTable interface.