Skip to content

Commit 93c0fe4

Browse files
committed
Fix constructors and conversion between categorical arrays
Previously only the conversions between CategoricalArrays and between NullableCategoricalArrays were using specific methods. This means conversion between these two types did not preserve levels, ordering or reference type. Use broader signatures to cover this, and throw an error if nulls are found when converting to CategoricalArray. Make tests more systematic to cover all possible combinations; remove old tests which are now redundant. Also drop the guaranty that constructors always return a copy. This is inconsistent with e.g. Array(), would have made the code more complex, and would trap users who thought the behavior was the same as for convert(). By the way, this an inconsistency in the default value for the 'ordered' argument to constructors, and fix docstrings: the default is now always to preserve the corresponding property of the input.
1 parent b88769b commit 93c0fe4

File tree

4 files changed

+140
-227
lines changed

4 files changed

+140
-227
lines changed

src/array.jl

Lines changed: 43 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -54,12 +54,8 @@ for (A, V, M) in ((:CategoricalArray, :CategoricalVector, :CategoricalMatrix),
5454
$As(A::NullableCategoricalArray; ordered::Bool=false)
5555
5656
If `A` is already a `CategoricalArray` or a `NullableCategoricalArray`, its levels
57-
and their order are preserved. The reference type is also preserved unless `compress`
58-
is provided. On the contrary, the `ordered` keyword argument takes precedence over
59-
the corresponding property of the input array, even when not provided.
60-
61-
In all cases, a copy of `A` is made: use `convert` to avoid making copies when
62-
unnecessary.
57+
are preserved; the same applies to the ordered property and the reference type unless
58+
explicitly overriden.
6359
""" ->
6460
function $A end
6561

@@ -88,12 +84,8 @@ for (A, V, M) in ((:CategoricalArray, :CategoricalVector, :CategoricalMatrix),
8884
$Vs(A::NullableCategoricalVector; ordered::Bool=false)
8985
9086
If `A` is already a `CategoricalVector` or a `NullableCategoricalVector`, its levels
91-
and their order are preserved. The reference type is also preserved unless `compress`
92-
is provided. On the contrary, the `ordered` keyword argument takes precedence over
93-
the corresponding property of the input array, even when not provided.
94-
95-
In all cases, a copy of `A` is made: use `convert` to avoid making copies when
96-
unnecessary.
87+
are preserved; the same applies to the ordered property and the reference type unless
88+
explicitly overriden.
9789
""" ->
9890
function $V end
9991

@@ -118,16 +110,12 @@ for (A, V, M) in ((:CategoricalArray, :CategoricalVector, :CategoricalMatrix),
118110
argument determines whether the array values can be compared according to the
119111
ordering of levels or not (see [`isordered`](@ref)).
120112
121-
$Ms(A::CategoricalMatrix; ordered::Bool=false)
122-
$Ms(A::NullableCategoricalMatrix; ordered::Bool=false)
113+
$Ms(A::CategoricalMatrix; ordered::Bool=isordered(A))
114+
$Ms(A::NullableCategoricalMatrix; ordered::Bool=isordered(A))
123115
124116
If `A` is already a `CategoricalMatrix` or a `NullableCategoricalMatrix`, its levels
125-
and their order are preserved. The reference type is also preserved unless `compress`
126-
is provided. On the contrary, the `ordered` keyword argument takes precedence over
127-
the corresponding property of the input array, even when not provided.
128-
129-
In all cases, a copy of `A` is made: use `convert` to avoid making copies when
130-
unnecessary.
117+
are preserved; the same applies to the ordered property and the reference type unless
118+
explicitly overriden.
131119
""" ->
132120
function $M end
133121

@@ -187,11 +175,16 @@ for (A, V, M) in ((:CategoricalArray, :CategoricalVector, :CategoricalMatrix),
187175

188176
## Constructors from arrays
189177

190-
# This method is needed to ensure ordered!() only mutates a copy of A
191-
@compat (::Type{$A{T, N, R}}){T, N, R}(A::$A{T, N, R}; ordered=_isordered(A)) =
192-
ordered!(copy(A), ordered)
178+
# This method is needed to ensure that a copy of the pool is always made
179+
# so that ordered!() does not affect the original array
180+
@compat function (::Type{$A{T, N, R}}){S, T, N, Q, R}(A::CatArray{S, N, Q}; ordered=_isordered(A))
181+
res = convert($A{T, N, R}, A)
182+
if res.pool === A.pool # convert() only makes a copy when necessary
183+
res = $A{T, N, R}(res.refs, deepcopy(res.pool))
184+
end
185+
ordered!(res, ordered)
186+
end
193187

194-
# Note this method is also used for CategoricalArrays when T, N or R don't match
195188
@compat (::Type{$A{T, N, R}}){T, N, R}(A::AbstractArray; ordered=_isordered(A)) =
196189
ordered!(convert($A{T, N, R}, A), ordered)
197190

@@ -218,21 +211,21 @@ for (A, V, M) in ((:CategoricalArray, :CategoricalVector, :CategoricalMatrix),
218211
$A{T, 2}(A, ordered=ordered)
219212

220213
# From CategoricalArray (preserve R)
221-
@compat (::Type{$A{T, N}}){S, T, N, R}(A::$A{S, N, R}; ordered=_isordered(A)) =
214+
@compat (::Type{$A{T, N}}){S, T, N, R}(A::CatArray{S, N, R}; ordered=_isordered(A)) =
222215
$A{T, N, R}(A, ordered=ordered)
223-
@compat (::Type{$A{T}}){S, T, N, R}(A::$A{S, N, R}; ordered=_isordered(A)) =
216+
@compat (::Type{$A{T}}){S, T, N, R}(A::CatArray{S, N, R}; ordered=_isordered(A)) =
224217
$A{T, N, R}(A, ordered=ordered)
225-
@compat (::Type{$A}){T, N, R}(A::$A{T, N, R}; ordered=_isordered(A)) =
218+
@compat (::Type{$A}){T, N, R}(A::CatArray{T, N, R}; ordered=_isordered(A)) =
226219
$A{T, N, R}(A, ordered=ordered)
227220

228-
@compat (::Type{$V{T}}){S, T, R}(A::$V{S, R}; ordered=_isordered(A)) =
221+
@compat (::Type{$V{T}}){S, T, R}(A::CatArray{S, 1, R}; ordered=_isordered(A)) =
229222
$A{T, 1, R}(A, ordered=ordered)
230-
@compat (::Type{$V}){T, R}(A::$V{T, R}; ordered=_isordered(A)) =
223+
@compat (::Type{$V}){T, R}(A::CatArray{T, 1, R}; ordered=_isordered(A)) =
231224
$A{T, 1, R}(A, ordered=ordered)
232225

233-
@compat (::Type{$M{T}}){S, T, R}(A::$M{S, R}; ordered=_isordered(A)) =
226+
@compat (::Type{$M{T}}){S, T, R}(A::CatArray{S, 2, R}; ordered=_isordered(A)) =
234227
$A{T, 2, R}(A, ordered=ordered)
235-
@compat (::Type{$M}){T, R}(A::$M{T, R}; ordered=_isordered(A)) =
228+
@compat (::Type{$M}){T, R}(A::CatArray{T, 2, R}; ordered=_isordered(A)) =
236229
$A{T, 2, R}(A, ordered=ordered)
237230

238231

@@ -270,21 +263,25 @@ for (A, V, M) in ((:CategoricalArray, :CategoricalVector, :CategoricalMatrix),
270263
res
271264
end
272265

273-
# From CategoricalArray (preserve R)
274-
function convert{S, T, N, R}(::Type{$A{T, N, R}}, A::$A{S, N})
266+
# From CategoricalArray (preserve levels, ordering and R)
267+
function convert{S, T, N, R}(::Type{$A{T, N, R}}, A::CatArray{S, N})
275268
if length(A.pool) > typemax(R)
276269
throw(LevelsException{T, R}(levels(A)[typemax(R)+1:end]))
277270
end
278271

272+
if $A <: CategoricalArray && isa(A, NullableCategoricalArray)
273+
any(x -> x == 0, A.refs) && throw(NullException())
274+
end
275+
279276
refs = convert(Array{R, N}, A.refs)
280277
pool = convert(CategoricalPool{T, R}, A.pool)
281278
ordered!($A(refs, pool), isordered(A))
282279
end
283-
convert{S, T, N, R}(::Type{$A{T, N}}, A::$A{S, N, R}) =
280+
convert{S, T, N, R}(::Type{$A{T, N}}, A::CatArray{S, N, R}) =
284281
convert($A{T, N, R}, A)
285-
convert{S, T, N, R}(::Type{$A{T}}, A::$A{S, N, R}) =
282+
convert{S, T, N, R}(::Type{$A{T}}, A::CatArray{S, N, R}) =
286283
convert($A{T, N, R}, A)
287-
convert{T, N, R}(::Type{$A}, A::$A{T, N, R}) =
284+
convert{T, N, R}(::Type{$A}, A::CatArray{T, N, R}) =
288285
convert($A{T, N, R}, A)
289286

290287
# R<:Integer is needed for this method to be considered more specific
@@ -658,37 +655,33 @@ this parameter will also introduce a type instability which can affect performan
658655
the function where the call is made. Therefore, use this option with caution (the
659656
one-argument version does not suffer from this problem).
660657
661-
categorical{T}(A::CategoricalArray{T}[, compress::Bool]; ordered::Bool=false)
662-
categorical{T}(A::NullableCategoricalArray{T}[, compress::Bool]; ordered::Bool=false)
658+
categorical{T}(A::CategoricalArray{T}[, compress::Bool]; ordered::Bool=isordered(A))
659+
categorical{T}(A::NullableCategoricalArray{T}[, compress::Bool]; ordered::Bool=isordered(A))
663660
664661
If `A` is already a `CategoricalArray` or a `NullableCategoricalArray`, its levels
665-
are preserved. The reference type is also preserved unless `compress` is provided.
666-
On the contrary, the `ordered` keyword argument takes precedence over the
667-
corresponding property of the input array, even when not provided.
668-
669-
In all cases, a copy of `A` is made: use `convert` to avoid making copies when
670-
unnecessary.
662+
are preserved; the same applies to the ordered property, and to the reference type
663+
unless `compress` is passed.
671664
"""
672665
function categorical end
673666

674-
categorical(A::AbstractArray; ordered=false) = CategoricalArray(A, ordered=ordered)
675-
categorical{T<:Nullable}(A::AbstractArray{T}; ordered=false) =
667+
categorical(A::AbstractArray; ordered=_isordered(A)) = CategoricalArray(A, ordered=ordered)
668+
categorical{T<:Nullable}(A::AbstractArray{T}; ordered=_isordered(A)) =
676669
NullableCategoricalArray(A, ordered=ordered)
677670

678671
# Type-unstable methods
679-
function categorical{T, N}(A::AbstractArray{T, N}, compress; ordered=false)
672+
function categorical{T, N}(A::AbstractArray{T, N}, compress; ordered=_isordered(A))
680673
RefType = compress ? reftype(length(unique(A))) : DefaultRefType
681674
CategoricalArray{T, N, RefType}(A, ordered=ordered)
682675
end
683-
function categorical{T<:Nullable, N}(A::AbstractArray{T, N}, compress; ordered=false)
676+
function categorical{T<:Nullable, N}(A::AbstractArray{T, N}, compress; ordered=_isordered(A))
684677
RefType = compress ? reftype(length(unique(A))) : DefaultRefType
685678
NullableCategoricalArray{T, N, RefType}(A, ordered=ordered)
686679
end
687-
function categorical{T, N, R}(A::CategoricalArray{T, N, R}, compress; ordered=false)
680+
function categorical{T, N, R}(A::CategoricalArray{T, N, R}, compress; ordered=_isordered(A))
688681
RefType = compress ? reftype(length(levels(A))) : R
689682
CategoricalArray{T, N, RefType}(A, ordered=ordered)
690683
end
691-
function categorical{T, N, R}(A::NullableCategoricalArray{T, N, R}, compress; ordered=false)
684+
function categorical{T, N, R}(A::NullableCategoricalArray{T, N, R}, compress; ordered=_isordered(A))
692685
RefType = compress ? reftype(length(levels(A))) : R
693686
NullableCategoricalArray{T, N, RefType}(A, ordered=ordered)
694687
end

test/11_array.jl

Lines changed: 0 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -30,28 +30,6 @@ for ordered in (false, true)
3030
@test convert(CategoricalVector{String, DefaultRefType}, x) == x
3131
@test convert(CategoricalVector{String, UInt8}, x) == x
3232

33-
for y in (CategoricalArray(x, ordered=ordered),
34-
CategoricalArray{String}(x, ordered=ordered),
35-
CategoricalArray{String, 1}(x, ordered=ordered),
36-
CategoricalArray{String, 1, R}(x, ordered=ordered),
37-
CategoricalArray{String, 1, DefaultRefType}(x, ordered=ordered),
38-
CategoricalArray{String, 1, UInt8}(x, ordered=ordered),
39-
CategoricalVector(x, ordered=ordered),
40-
CategoricalVector{String}(x, ordered=ordered),
41-
CategoricalVector{String, R}(x, ordered=ordered),
42-
CategoricalVector{String, DefaultRefType}(x, ordered=ordered),
43-
CategoricalVector{String, UInt8}(x, ordered=ordered),
44-
categorical(x, ordered=ordered),
45-
categorical(x, false, ordered=ordered),
46-
categorical(x, true, ordered=ordered))
47-
@test isa(y, CategoricalVector{String})
48-
@test isordered(y) === ordered
49-
@test y == x
50-
@test y !== x
51-
@test y.refs !== x.refs
52-
@test y.pool !== x.pool
53-
end
54-
5533
@test convert(CategoricalArray, a) == x
5634
@test convert(CategoricalArray{String}, a) == x
5735
@test convert(CategoricalArray{String, 1}, a) == x
@@ -254,28 +232,6 @@ for ordered in (false, true)
254232
@test convert(CategoricalVector{Float64, DefaultRefType}, x) == x
255233
@test convert(CategoricalVector{Float64, UInt8}, x) == x
256234

257-
for y in (CategoricalArray(x, ordered=ordered),
258-
CategoricalArray{Float64}(x, ordered=ordered),
259-
CategoricalArray{Float64, 1}(x, ordered=ordered),
260-
CategoricalArray{Float64, 1, R}(x, ordered=ordered),
261-
CategoricalArray{Float64, 1, DefaultRefType}(x, ordered=ordered),
262-
CategoricalArray{Float64, 1, UInt8}(x, ordered=ordered),
263-
CategoricalVector(x, ordered=ordered),
264-
CategoricalVector{Float64}(x, ordered=ordered),
265-
CategoricalVector{Float64, R}(x, ordered=ordered),
266-
CategoricalVector{Float64, DefaultRefType}(x, ordered=ordered),
267-
CategoricalVector{Float64, UInt8}(x, ordered=ordered),
268-
categorical(x, ordered=ordered),
269-
categorical(x, false, ordered=ordered),
270-
categorical(x, true, ordered=ordered))
271-
@test isa(y, CategoricalVector{Float64})
272-
@test isordered(y) === ordered
273-
@test y == x
274-
@test y !== x
275-
@test y.refs !== x.refs
276-
@test y.pool !== x.pool
277-
end
278-
279235
@test convert(CategoricalArray, a) == x
280236
@test convert(CategoricalArray{Float64}, a) == x
281237
@test convert(CategoricalArray{Float32}, a) == x
@@ -438,28 +394,6 @@ for ordered in (false, true)
438394
@test convert(CategoricalMatrix{String, DefaultRefType}, x) == x
439395
@test convert(CategoricalMatrix{String, UInt8}, x) == x
440396

441-
for y in (CategoricalArray(x, ordered=ordered),
442-
CategoricalArray{String}(x, ordered=ordered),
443-
CategoricalArray{String, 2}(x, ordered=ordered),
444-
CategoricalArray{String, 2, R}(x, ordered=ordered),
445-
CategoricalArray{String, 2, DefaultRefType}(x, ordered=ordered),
446-
CategoricalArray{String, 2, UInt8}(x, ordered=ordered),
447-
CategoricalMatrix(x, ordered=ordered),
448-
CategoricalMatrix{String}(x, ordered=ordered),
449-
CategoricalMatrix{String, R}(x, ordered=ordered),
450-
CategoricalMatrix{String, DefaultRefType}(x, ordered=ordered),
451-
CategoricalMatrix{String, UInt8}(x, ordered=ordered),
452-
categorical(x, ordered=ordered),
453-
categorical(x, false, ordered=ordered),
454-
categorical(x, true, ordered=ordered))
455-
@test isa(y, CategoricalMatrix{String})
456-
@test isordered(y) === ordered
457-
@test y == x
458-
@test y !== x
459-
@test y.refs !== x.refs
460-
@test y.pool !== x.pool
461-
end
462-
463397
@test convert(CategoricalArray, a) == x
464398
@test convert(CategoricalArray{String}, a) == x
465399
@test convert(CategoricalArray{String, 2, R}, a) == x

0 commit comments

Comments
 (0)