-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
std.ascii
: make toLower
toUpper
branchless
#21369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like the wrong link? |
Updated. |
Looks great! What do you think about: fn toLower(c: u8) u8 {
return c | (if (std.ascii.isUpper(c)) 32 else c);
} It's 2 bytes smaller and has a bit smaller block rtp. |
What is the aspect that is compared? |
I take back what I said :). The tool I used to determine the size of the instructions seems to be incorrect for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet implementation. I can't think of a better one that generates such dead simple codegen.
Block RTP,
Previous:
Skylake Tigerlake
uiCA | 2.33 | 2.00
llvm-mca | 2.10 | 2.10
This one:
Skylake Tigerlake
uiCA | 2.41 | 2.00
llvm-mca | 1.88 | 1.88
uiCA thinks it's a tad slower on skylake, but it isn't from any practical point of view. The cmov
takes a bit of space and latency in the previous implementation and it's nice to get rid of.
Do note that Debug generates slightly worse code due to the overflow check with the multiply. |
Sounds reasonable. Any other optimization mode figures out the width of the integer and ellids this. |
(Not a Zig contributor, just my 2 cents.) |
I believe you misunderstand something.
There 100% is. We, as the user, can guarantee that this safety check will never be triggered. The largest value this will ever be is 32, which fits into a |
I don't think I'm misunderstanding something, I think this just isn't how my brain works. If you give me a programming language with a debug mode that has runtime checks for integer overflows I expect there to be runtime checks for integer overflows in debug mode. But I'm not really a systems programmer, and I'm not really cut out for it I guess, because these functions were already branchless. This PR hurts readability, maintainability, destroys the semantic meaning of the code, doesn't remove any branches, and you can't even say it improves performance because you haven't done a benchmark. The only metric that seems to have been considered here is "number of instructions generated in godbolt." So yes, I don't understand. Is this what systems programming is? (My tone may have been out of hand here, I apologize. I mean no disrespect to any person.) |
To clarify,
Integer overflow cannot happen here. It's guaranteed. There's no point in using
We're removing the large and relatively expensive
Doing machine code analysis is the only way to get an understanding of the performance of such a small function. CPU variance would just be too large, and the uses cases too vague to create a real-world benchmark for this. Hence we look at the exact instructions used, how much space it takes up, how much padding was required in the function, what are the instruction dependency bottlenecks, etc.
It certainly isn't. The two metrics we care about here the most are how large the function is in bytes and its block rtp. |
If it helps resolve conflict, you can maybe try this instead? fn toLower(c: u8) u8 {
return c | @as(u8, @intFromBool(isUpper(c))) << 5;
} |
I think it's a good idea to use the left shift operator here, too. It's
To address concerns about readability, I'd propose making the two functions a little more symmetric, and have them both use the XOR operator, rather than having one use an OR while the other uses an XOR. That way it's clearer both
*MIPS should use a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is guaranteed integer overflow cannot occur, then *
is the correct operator, not *%
. The latter is only appropriate if wrapping integer arithmetic behavior is required.
Fair enough. Not trying to die on a hill here :). I like the shifting version, I think it makes the most sense. |
return c; | ||
} | ||
const mask = @as(u8, @intFromBool(isLower(c))) << 5; | ||
return c ^ mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be &
, rather than ^
(xor)? As is it would toggle the case, which isn't what the function is supposed to do (both by convention and according to doc comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xor looks correct -- you're right that it toggles the case, but only when isLower(c) is true
Co-authored-by: WX\shixi <[email protected]>
Co-authored-by: WX\shixi <[email protected]>
Godbolt link
Remove some extended instructions.