-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Approximate Median-of-Medians in Select #92348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
Previously, there was a cheap approximation to median of medians, but it lead to O(n^2) worst case behaviour. Instead, use an approximate median of medians algorithm. This is linear time, but is much cheaper than normal median of medians since it selects random chunks to sort and only finds the medians of those. It only uses constant space, and in theory could be configured better to handle small inputs to allocate less.
6d5b898
to
a8253d2
Compare
I don't think we should use something random and non-deterministic here; that would make behavior and runtime performance itself non-deterministic, and that's something we've had a lot of trouble with, especially when trying to benchmark. |
The algorithm provided is fully deterministic, since it uses an LCG seeded with the size of the array. In some sense, the old version was also "random" in that it selected arbitrary indices to select, but it was just more clear which indices it would select since they were at fixed distances based on the length of the array. |
@JulianKnodt Ah, I see. Isn't that very nearly as easy to craft worst-cases for, then, if you know the length in advance? |
If you're willing to put the effort in, I think it would still be possible, but it would require more work. I assume in practice it would hit the worst case less often, but as long as it's deterministic an adversarial case could be made |
@JulianKnodt @rustbot label: +S-inactive |
@JohnCSimon I'm not sure what more there is to do for this PR, as I wasn't able to understand exactly what would make this feature more ready for being merged. |
Previously, there was a cheap constant time approximation to median of medians which sampled a very small amount from fixed indices in the array, but it had
O(n^2)
worst case behaviour which was simple to craft worst cases for.Instead, use a more general approximation median of medians algorithm. This is also constant time,
but is much cheaper than normal median of medians since it selects random chunks to sort and
only finds the medians of those. It only uses constant space, and in theory could be configured
better to handle small inputs to allocate less.
While it uses a weird LCG like thing, if there was a
rand
implementation incore
, it could be used instead. While this is currently deterministic and it could be argued that that would lead to being able to craft adversarial cases, this is also true for the prior implementation.Addresses #91644
I didn't include full benchmarks yet, because I can't get the third one to complete on my machine for the previous implementation, possibly because I'm running compute intensive jobs at the same time.It probably makes sense to add more benchmarks for the average case, but here's the worst case provided in the original issue:
New:
Old: