-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Optimize: core slice binary_search_by
#141097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
512dda9
to
d2c7ac8
Compare
This comment has been minimized.
This comment has been minimized.
There have been many changes and attempts to binary_search.
Some versions of this look quite like yours, so you'll have to justify this change by a more detailed analysis and benchmark results. std already has some that you can try to run.
That's possible, but only worthwhile in a few cases where good results can be achieved with the baseline SIMD instructions available on tier1 platforms, e.g. SSE2 on x86-64-linux, since runtime detection isn't available in |
Which benchmarks? Do they happen to look up the same value in the same slice repeatedly? Or otherwise have predictable patterns in the outcome of the |
d2c7ac8
to
def517a
Compare
This comment has been minimized.
This comment has been minimized.
I agree that you need a more detailed analysis and benchmark results. Have found benchmarks in |
d27f18f
to
08464b5
Compare
This comment has been minimized.
This comment has been minimized.
08464b5
to
b005d21
Compare
I have found a more optimal algorithm for the binary search. I have ran benchmarks and it works faster than the previous one. Also I have looked at the compiled assembly, with and without an early return, and it looks good either way. And benchmarks show that it's faster with an early return. Looks strange for me that "CPU can reliably predict the loop count" based on the mathematical operation as it was before (substraction). But I don't know much about branch prediction, so feel free to improve it if possible.
Also I would like to know, is there any attitude towards optimizations using SIMD? Can I write platform specific SIMD intrinsics, or what should I use if I would like to contribute SIMD code optimizations? Is there already any SIMD code optimizations in core/std, I could look at as an example?
P.S. It's my first contribution here.