Skip to content

Get tests pass again, fix handling of string columns #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 20, 2017

Conversation

nalimilan
Copy link
Member

Move to DataTables temporarily since the code is written to work with Nullable for now. This allows porting JuliaData/DataFrames.jl@781bbbd to StatsModels (first commit), which is a first step towards supporting any kind of categorical input (i.e. both DataArray and CategoricalArray).

If this works, we could remove the conflicting API from DataTables so that we no longer need the using hacks. Then we could start relying only on the AbstractTable interface.

@codecov-io
Copy link

codecov-io commented Feb 18, 2017

Codecov Report

Merging #13 into master will decrease coverage by -0.72%.
The diff coverage is 58.62%.

@@            Coverage Diff             @@
##           master      #13      +/-   ##
==========================================
- Coverage   93.82%   93.11%   -0.72%     
==========================================
  Files           5        5              
  Lines         324      334      +10     
==========================================
+ Hits          304      311       +7     
- Misses         20       23       +3
Impacted Files Coverage Δ
src/statsmodel.jl 84.84% <ø> (ø)
src/contrasts.jl 93.87% <100%> (-0.13%)
src/modelmatrix.jl 89.39% <50%> (-1.24%)
src/modelframe.jl 90.9% <66.66%> (-1.74%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd1c9a5...739d478. Read the comment docs.

@@ -102,34 +104,48 @@ end

const DEFAULT_CONTRASTS = DummyCoding

_levels(x::Union{CategoricalArray, NullableCategoricalArray}) = levels(x)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: need to change this to unique, cf. JuliaData/DataFrames.jl#1147 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that was quite easy. What is going to be harder is to handle DataArrays the same way, since unique(::PooledDataArray) does not return values in the order specified by levels. Since unique does not guarantee any particular order for levels, I suggest we change DataArrays to follow the ordering of levels (just like CategoricalArray). That way, we don't need any common API other than unique.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md Outdated
@@ -9,4 +9,4 @@ in Julia.

**Documentation:** [![Latest](https://img.shields.io/badge/docs-latest-blue.svg)](https://juliastats.github.io/StatsModels.jl/latest)

Originally part of [DataFrames.jl](http://github.com/JuliaStats/DataFrames.jl).
Originally part of [DataTables.jl](http://github.com/JuliaStats/DataTables.jl).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old one is a bit more accurate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically both are true, but indeed that's not the most informative choice...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth expanding on this a little in the readme; something along the lines of "this was once part of dataframes, now it depends on datatables, but eventually will support generic tabular data stores including both dataframes and datatables"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just to avoid confusion when people come here looking for help/information)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I would wait until things have stabilized a bit more, and then explain how the package is supposed to be used. Anyway we need generic interfaces very soon to support DataFrames.

@nalimilan nalimilan force-pushed the nl/modelmat branch 3 times, most recently from 5a3550d to 0c2536d Compare February 18, 2017 18:59
@nalimilan
Copy link
Member Author

Julia 0.6 failure is fixed by JuliaLang/julia#20665.

All non-real columns are now considered as categorical, since conversion
to a float column will likely fail. Types which should be converted to float
will have to define is_categorical(::T) = false. Before this, Vector{String}
and NullableVector{String} columns triggered an error.

Replace ContrastsMatrix(c::ContrastsMatrix, x::CategoricalArray) with
ContrastsMatrix(c::ContrastsMatrix, levels::AbstractVector), offering
equivalent functionality: it's more consistent with the ContrastsMatrix()
constructor, and more general since only the levels are actually needed.
The code is currently written to work with Nullable.
@nalimilan nalimilan changed the title WIP: Get tests pass again, fix handling of string columns Get tests pass again, fix handling of string columns Feb 19, 2017
@nalimilan
Copy link
Member Author

I've squashed some commits and removed the hacks now that DataTables no longer export conflicting function definitions. So I'd say it's good to go.

@kleinschmidt kleinschmidt merged commit 1c8a187 into master Feb 20, 2017
@nalimilan nalimilan deleted the nl/modelmat branch February 20, 2017 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants