Approximate Median-of-Medians in Select #92348

JulianKnodt · 2021-12-28T10:31:10Z

Previously, there was a cheap constant time approximation to median of medians which sampled a very small amount from fixed indices in the array, but it had O(n^2) worst case behaviour which was simple to craft worst cases for.
Instead, use a more general approximation median of medians algorithm. This is also constant time,
but is much cheaper than normal median of medians since it selects random chunks to sort and
only finds the medians of those. It only uses constant space, and in theory could be configured
better to handle small inputs to allocate less.

While it uses a weird LCG like thing, if there was a rand implementation in core, it could be used instead. While this is currently deterministic and it could be argued that that would lead to being able to craft adversarial cases, this is also true for the prior implementation.

Addresses #91644

I didn't include full benchmarks yet, because I can't get the third one to complete on my machine for the previous implementation, possibly because I'm running compute intensive jobs at the same time.

It probably makes sense to add more benchmarks for the average case, but here's the worst case provided in the original issue:

New:

test slice::quickselect_l1                                      ... bench:       1,754 ns/iter (+/- 2)
test slice::quickselect_l2                                      ... bench:       5,829 ns/iter (+/- 39)
test slice::quickselect_l3                                      ... bench:     530,171 ns/iter (+/- 10,829)

Old:

test slice::quickselect_l1                                      ... bench:         950 ns/iter (+/- 15)
test slice::quickselect_l2                                      ... bench:       7,375 ns/iter (+/- 44)
test slice::quickselect_l3                                      ... bench:     728,402 ns/iter (+/- 6,931)

rust-highfive · 2021-12-28T10:31:12Z

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

Previously, there was a cheap approximation to median of medians, but it lead to O(n^2) worst case behaviour. Instead, use an approximate median of medians algorithm. This is linear time, but is much cheaper than normal median of medians since it selects random chunks to sort and only finds the medians of those. It only uses constant space, and in theory could be configured better to handle small inputs to allocate less.

joshtriplett · 2022-01-15T20:24:11Z

I don't think we should use something random and non-deterministic here; that would make behavior and runtime performance itself non-deterministic, and that's something we've had a lot of trouble with, especially when trying to benchmark.

JulianKnodt · 2022-01-15T20:38:53Z

The algorithm provided is fully deterministic, since it uses an LCG seeded with the size of the array. In some sense, the old version was also "random" in that it selected arbitrary indices to select, but it was just more clear which indices it would select since they were at fixed distances based on the length of the array.

joshtriplett · 2022-01-15T20:41:45Z

@JulianKnodt Ah, I see. Isn't that very nearly as easy to craft worst-cases for, then, if you know the length in advance?

JulianKnodt · 2022-01-15T20:44:36Z

If you're willing to put the effort in, I think it would still be possible, but it would require more work. I assume in practice it would hit the worst case less often, but as long as it's deterministic an adversarial case could be made

JohnCSimon · 2022-04-11T02:39:43Z

@JulianKnodt
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this. Thanks for your contribution.

@rustbot label: +S-inactive

JulianKnodt · 2022-04-11T22:11:22Z

@JohnCSimon I'm not sure what more there is to do for this PR, as I wasn't able to understand exactly what would make this feature more ready for being merged.

rust-highfive assigned joshtriplett Dec 28, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 28, 2021

This comment has been minimized.

Sign in to view

JulianKnodt force-pushed the select_nth_unstable branch from 6d5b898 to a8253d2 Compare December 28, 2021 10:44

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 27, 2022

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 20, 2022

JohnCSimon closed this Apr 11, 2022

rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approximate Median-of-Medians in Select #92348

Approximate Median-of-Medians in Select #92348

JulianKnodt commented Dec 28, 2021 •

edited

Loading

rust-highfive commented Dec 28, 2021

This comment has been minimized.

joshtriplett commented Jan 15, 2022

JulianKnodt commented Jan 15, 2022

joshtriplett commented Jan 15, 2022

JulianKnodt commented Jan 15, 2022 •

edited

Loading

JohnCSimon commented Apr 11, 2022

JulianKnodt commented Apr 11, 2022

Approximate Median-of-Medians in Select #92348

Approximate Median-of-Medians in Select #92348

Conversation

JulianKnodt commented Dec 28, 2021 • edited Loading

rust-highfive commented Dec 28, 2021

This comment has been minimized.

joshtriplett commented Jan 15, 2022

JulianKnodt commented Jan 15, 2022

joshtriplett commented Jan 15, 2022

JulianKnodt commented Jan 15, 2022 • edited Loading

JohnCSimon commented Apr 11, 2022

JulianKnodt commented Apr 11, 2022

JulianKnodt commented Dec 28, 2021 •

edited

Loading

JulianKnodt commented Jan 15, 2022 •

edited

Loading