-
Notifications
You must be signed in to change notification settings - Fork 44
Improve quantile performance v2 #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #91 +/- ##
==========================================
+ Coverage 96.89% 96.91% +0.02%
==========================================
Files 1 1
Lines 419 422 +3
==========================================
+ Hits 406 409 +3
Misses 13 13
Continue to review full report at Codecov.
|
The only drawback of this approach is the case when very many quantiles are requested as we sort |
Thanks. Is there any reason to think that a series of partial sorts of nested subsets of the data would be significantly slower than a single full sort? Have you tried benchmarking this? |
We have to sort |
Here are the benchmarks:
so as you can see it starts to deteriorate much faster. (as usual - it would not hurt if you double checked this if you had time as I might have made some error here) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I had two comments pending...
sort!(v, 1, lv, Base.Sort.PartialQuickSort(lo:hi), Base.Sort.Forward) | ||
start = 1 | ||
for pv in sort(p) | ||
lv = length(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this out of the loop? BTW, better use lastindex
even if we call require_one_based_indexing(v)
.
lo = floor(Int,pv*(lv)) | ||
hi = ceil(Int,1+pv*(lv)) | ||
sort!(v, start, lv, Base.Sort.PartialQuickSort(lo:hi), Base.Sort.Forward) | ||
start = hi + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you completely sure of the +1
? Is that still correct if p
contains duplicates? That would be worth testing.
I'm not sure it's worth worrying about performance when the number of quantiles is large compared to the data. Quantiles don't make a lot of sense in that case. Maybe a simple optimization is to do |
Actually with the (semi-)recent improvements to sorting performance, it doesn't seem that partial sort is a good idea, as it uses quick sort, while e.g. radix sort is much faster for standard numeric types. Even for Maybe we should just switch to a full sort? I'm a bit surprised that the implementation from this PR is so slow in my benchmark. I would be good to double-check the result. (I think yours had a bug because it reused julia> function f1(n)
@btime Statistics._quantilesort!($(rand(10^6)), false, extrema($(rand(n)))...)
@btime new_quantilesort!($(rand(10^6)), false, $(rand(n)))
@btime sort!($(rand(10^6)))
nothing
end
f1 (generic function with 1 method)
julia> f1(5)
8.901 ms (0 allocations: 0 bytes)
6.912 ms (2 allocations: 96 bytes)
2.568 ms (0 allocations: 0 bytes)
julia> f1(50)
18.857 ms (0 allocations: 0 bytes)
49.713 ms (2 allocations: 480 bytes)
1.840 ms (0 allocations: 0 bytes)
julia> f1(500)
19.029 ms (0 allocations: 0 bytes)
520.530 ms (8 allocations: 9.09 KiB)
2.607 ms (0 allocations: 0 bytes)
julia> function f2(n)
@btime Statistics._quantilesort!($(big.(rand(10^6))), false, extrema($(rand(n)))...)
@btime new_quantilesort!($(big.(rand(10^6))), false, $(rand(n)))
@btime sort!($(big.(rand(10^6))))
nothing
end
f2 (generic function with 1 method)
julia> f2(5)
538.816 ms (0 allocations: 0 bytes)
351.436 ms (2 allocations: 96 bytes)
77.425 ms (0 allocations: 0 bytes)
julia> f2(50)
861.641 ms (0 allocations: 0 bytes)
8.368 s (2 allocations: 480 bytes)
68.657 ms (0 allocations: 0 bytes)
julia> f2(500)
776.346 ms (0 allocations: 0 bytes)
51.798 s (8 allocations: 9.09 KiB)
84.172 ms (0 allocations: 0 bytes)
|
This is an alternative implementation to #86.
Here following #86 (comment) I perform partial sorting incrementally.
I make this a separate PR to allow an easy comparison of both. Either one or the other should be merged.