-
Notifications
You must be signed in to change notification settings - Fork 930
feat: take kernel for RunArray #3622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The current iteration of Take kernel benchmarks
|
I added a different approach to find physical indices for give logical indices. This approach sorts the input logical indices and then loops through run_ends array instead of doing a binary search. I ran benchmarks comparing the two approaches. When the length of physical array and take indices are higher, the loop based approach is the clear winner with over 30% improvement compared to binary search approach. However for inputs of smaller size, the binary search approach seems to have better performance. I have furnished the results below. In the below benchmark results, the feature
Benchmark result
|
This is in continuation to previous comment #3622 (comment) I looked into the current Based on this result, I am removing the binary search based approach and using the loop based approach for Benchmark result
|
/// Returns index to the physical array for the given index to the logical array. | ||
/// Performs a binary search on the run_ends array for the input index. | ||
#[inline] | ||
pub fn get_physical_index(&self, logical_index: usize) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is moved from TypedRunArray
to RunArray
as the function has nothing to do with values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thank you for this, I also confirmed it does not appear to meaningfully impact compile times which is 👌
The CI failure seems unrelated. |
Benchmark runs are scheduled for baseline = 4835659 and contender = 9131c30. 9131c30 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #3520
Rationale for this change
See issue description.
What changes are included in this PR?
RunArrayIter
.Are there any user-facing changes?