Skip to content

CategoricalArray constructor sometimes does not copy all data #107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bkamins opened this issue Dec 10, 2017 · 6 comments
Closed

CategoricalArray constructor sometimes does not copy all data #107

bkamins opened this issue Dec 10, 2017 · 6 comments

Comments

@bkamins
Copy link
Member

bkamins commented Dec 10, 2017

See
JuliaData/DataFrames.jl#1309 (comment)
for an example

The problem is that lines:
https://github.com/JuliaData/CategoricalArrays.jl/blob/master/src/array.jl#L269
and
https://github.com/JuliaData/CategoricalArrays.jl/blob/master/src/array.jl#L270
do not guarantee that a copy is performed (sometimes convert returns the reference to the original object).

CC @nalimilan

@bkamins
Copy link
Member Author

bkamins commented Dec 10, 2017

Just to add. The problem is that after:

x = categorical(1:10)
y = categorical(x)

we have that x and y are different objects (if they were the same this would be a normal default behavior like convert).

So the decision is to make: should categegorical always create a new object.
If yes - then it should guarantee to copy everything.
If no - then we should have a function that performs copy of pool and refs (but probably not a deepcopy of everything).

@nalimilan
Copy link
Member

Good catch. The problem is not in convert, though, as it shouldn't make a copy unless necessary. It's the constructor which should copy the refs just like we currently do for the pool here. Comments should also be adapted to say that constructors should always make copies (for reference, design decision happened here: JuliaLang/julia#23107).

@bkamins
Copy link
Member Author

bkamins commented Dec 11, 2017

I support making a copy.
Do I understand correctly that you would make an appropriate PR?
(I could try this PR, but as you can see from the above comment I do not know the details of design of CategoricalArrays so I do not want to mess up something).

@nalimilan
Copy link
Member

I'm quite busy right now with other PRs, so I'd appreciate if you could do it. Most of the work should be to write a test I think.

@bkamins
Copy link
Member Author

bkamins commented Dec 11, 2017

I have made a PR.

@bkamins
Copy link
Member Author

bkamins commented Dec 25, 2017

closing as this is fixed.

@bkamins bkamins closed this as completed Dec 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants