-
Notifications
You must be signed in to change notification settings - Fork 50
Implement reductions with optional skipna argument #101
Conversation
It seems like some of the nightlies have not been updated in 11 days, so this is going to have to wait. |
Now with reductions across dimensions! (although not yet for |
What should reductions do if there are no non-NA values and
Should the same rules apply when reducing a non-empty vector that contains NAs? R does the same things with However, generalization to reductions across dimensions suggests that maybe |
Okay, this should be ready to merge. With few NAs, performance is quite good, typically within 50% of Base for both |
Implement reductions with optional skipna argument
This implements reductions with an optional
skipna
argument, fixing #3, fixing JuliaData/DataFrames.jl#259, and fixing JuliaData/DataFrames.jl#354, and mostly superseding #32 (except forskewness
andkurtosis
). The following reductions are implemented, all in terms ofmapreduce
:sum
prod
maximum
minimum
Base.sumabs
Base.sumabs2
var
varm
std
stdm
With
skipna=false
, for some reductions that are guaranteed to return NA when any input is NA, we first check for NAs and then call_mapreduce
from Base on thedata
Array if there are none. This has basically no overhead. For other reductions, we simply call the implementations in Base on the DataArray, which is slow due to type instability in indexing, but is the most obvious way to guarantee correctness.With
skipna=true
, we first check if there are NA values. If not, we call_mapreduce
from Base on thedata
Array. If there are, we use either an algorithm that branches on NA or an algorithm that does not branch on NA depending on the types and functors involved. For summation, we use a pairwise algorithm that divides blocks along BitArray chunk boundaries based on the number of non-NA elements. For blocks that contain no NAs, we call the implementation in Base. This has little overhead when there are few NA elements.The tests currently fail on Travis because I based the implementation of
var
off of JuliaLang/julia#7502 and so it has slightly different semantics than the current Julia master, but the tests here are pretty comprehensive, and as soon as that is merged this should be good to go.