`std.ascii`: make `toLower` `toUpper` branchless #21369

CrazyboyQCD · 2024-09-10T01:56:33Z

Godbolt link
Remove some extended instructions.

alexrp · 2024-09-10T02:00:55Z

Godbolt link

Looks like the wrong link?

CrazyboyQCD · 2024-09-10T02:05:18Z

Godbolt link

Looks like the wrong link?

Updated.

Rexicon226 · 2024-09-10T03:17:28Z

Looks great! What do you think about:

fn toLower(c: u8) u8 {
    return c | (if (std.ascii.isUpper(c)) 32 else c);
}

It's 2 bytes smaller and has a bit smaller block rtp.

CrazyboyQCD · 2024-09-10T03:39:13Z

It's 2 bytes smaller

What is the aspect that is compared?
If for the instructions size, current implementation is 2 bytes smaller and yours is 1 byte smaller(godbolt link).

Rexicon226 · 2024-09-10T03:51:36Z

What is the aspect that is compared?

I take back what I said :). The tool I used to determine the size of the instructions seems to be incorrect for the lea. Well, it's just an idea, your implementation is great as is.

Rexicon226

Sweet implementation. I can't think of a better one that generates such dead simple codegen.

Block RTP,
Previous:

           Skylake   Tigerlake   
uiCA     | 2.33    | 2.00
llvm-mca | 2.10    | 2.10

This one:

           Skylake   Tigerlake   
uiCA     | 2.41    | 2.00
llvm-mca | 1.88    | 1.88

uiCA thinks it's a tad slower on skylake, but it isn't from any practical point of view. The cmov takes a bit of space and latency in the previous implementation and it's nice to get rid of.

zeroZshadow · 2024-09-10T11:17:07Z

Do note that Debug generates slightly worse code due to the overflow check with the multiply.
Would it be worth using the *% operator to avoid this check?

Rexicon226 · 2024-09-10T13:50:10Z

Do note that Debug generates slightly worse code due to the overflow check with the multiply. Would it be worth using the *% operator to avoid this check?

Sounds reasonable. Any other optimization mode figures out the width of the integer and ellids this.

MatthiasPortzel · 2024-09-10T19:26:27Z

Would it be worth using the *% operator to avoid this check?

(Not a Zig contributor, just my 2 cents.)
This is not reasonable. A feature of the language is debug safety checks. There’s no reason to work around them here. Even if there was, Zig provides a builtin called @setRuntimeSafety that allows you to disable runtime safety checks for a block.

Rexicon226 · 2024-09-10T19:38:20Z

I believe you misunderstand something.

There’s no reason to work around them here.

There 100% is. We, as the user, can guarantee that this safety check will never be triggered. The largest value this will ever be is 32, which fits into a u8. Wrapping multiplication is the proper way to disable this check.

MatthiasPortzel · 2024-09-11T01:51:23Z

I don't think I'm misunderstanding something, I think this just isn't how my brain works.

If you give me a programming language with a debug mode that has runtime checks for integer overflows I expect there to be runtime checks for integer overflows in debug mode.
If you use a wrapping multiplication operator I expect there to be wrapping multiplication happening.
If you want to disable a runtime safety check for performance reasons I expect you to use the builtin that is designed for disabling runtime safety checks.

But I'm not really a systems programmer, and I'm not really cut out for it I guess, because these functions were already branchless. cmovae isn't a branching instruction. What are we doing here!?

This PR hurts readability, maintainability, destroys the semantic meaning of the code, doesn't remove any branches, and you can't even say it improves performance because you haven't done a benchmark.

The only metric that seems to have been considered here is "number of instructions generated in godbolt." So yes, I don't understand. Is this what systems programming is?

(My tone may have been out of hand here, I apologize. I mean no disrespect to any person.)

Rexicon226 · 2024-09-11T02:01:41Z

To clarify,

If you give me a programming language with a debug mode that has runtime checks for integer overflows I expect there to be runtime checks for integer overflows in debug mode.

Integer overflow cannot happen here. It's guaranteed. There's no point in using @setRuntimeSafetyFalse here since we do want runtime-safety. We're just telling it that that specific multiply will not overflow, or more accurately to not care if it overflows, since it won't happen.

But I'm not really a systems programmer, and I'm not really cut out for it I guess, because these functions were already branchless. cmovae isn't a branching instruction. What are we doing here!?

We're removing the large and relatively expensive cmov. It's bottlenecking the block rtp of this function.

you can't even say it improves performance because you haven't done a benchmark.

Doing machine code analysis is the only way to get an understanding of the performance of such a small function. CPU variance would just be too large, and the uses cases too vague to create a real-world benchmark for this. Hence we look at the exact instructions used, how much space it takes up, how much padding was required in the function, what are the instruction dependency bottlenecks, etc.

The only metric that seems to have been considered here is "number of instructions generated in godbolt."

It certainly isn't. The two metrics we care about here the most are how large the function is in bytes and its block rtp.

T-727 · 2024-09-12T08:05:56Z

If it helps resolve conflict, you can maybe try this instead?

fn toLower(c: u8) u8 {
    return c | @as(u8, @intFromBool(isUpper(c))) << 5;
}

SilasLock · 2024-09-12T19:40:53Z

I think it's a good idea to use the left shift operator here, too. It's

a little more readable, since it more directly describes what the code is trying to do (i.e. create a bitmask using the output of isLower or isUpper and use it to swap the 6th bit of c), and
is exactly what the optimizer ends up outputting anyway, which is to use a shl instruction on x86. This can be seen in the Godbolt link at the top of this PR conversation. And for reference, I'm pretty sure this is the optimal thing to do in MIPS, as well*.

To address concerns about readability, I'd propose making the two functions a little more symmetric, and have them both use the XOR operator, rather than having one use an OR while the other uses an XOR. That way it's clearer both toUpper and toLower are doing the same sort of thing (swapping the value of a single bit). I'd also split the mask definition onto a separate line.

pub fn toUpper(c: u8) u8 {
    const mask = @as(u8, @intFromBool(std.ascii.isLower(c))) << 5;
    return c ^ mask;
}

pub fn toLower(c: u8) u8 {
    const mask = @as(u8, @intFromBool(std.ascii.isUpper(c))) << 5;
    return c ^ mask;
}

*MIPS should use a sltu instruction to compute the 0 or 1 bit outcome of isUpper/isLower, so there's even less overhead from the setb instruction used in x86.

andrewrk

If it is guaranteed integer overflow cannot occur, then * is the correct operator, not *%. The latter is only appropriate if wrapping integer arithmetic behavior is required.

Rexicon226 · 2024-09-13T03:07:42Z

If it is guaranteed integer overflow cannot occur, then * is the correct operator, not *%. The latter is only appropriate if wrapping integer arithmetic behavior is required.

Fair enough. Not trying to die on a hill here :). I like the shifting version, I think it makes the most sense.

daurnimator · 2024-09-16T10:45:26Z

lib/std/ascii.zig

-        return c;
-    }
+    const mask = @as(u8, @intFromBool(isLower(c))) << 5;
+    return c ^ mask;


Shouldn't this be &, rather than ^ (xor)? As is it would toggle the case, which isn't what the function is supposed to do (both by convention and according to doc comment)

xor looks correct -- you're right that it toggles the case, but only when isLower(c) is true

Co-authored-by: WX\shixi <[email protected]>

chore: make toLower toUpper branchless

30c443a

Rexicon226 approved these changes Sep 10, 2024

View reviewed changes

chore: use *% to avoid overflow check to optimize debug codegen

b5d3b56

andrewrk requested changes Sep 13, 2024

View reviewed changes

chore: use shift operation for better readability

f118b5a

CrazyboyQCD requested a review from andrewrk September 13, 2024 06:15

andrewrk merged commit 8ddce90 into ziglang:master Sep 14, 2024
10 checks passed

daurnimator reviewed Sep 16, 2024

View reviewed changes

DivergentClouds pushed a commit to DivergentClouds/zig that referenced this pull request Sep 24, 2024

std.ascii: make toLower toUpper branchless (ziglang#21369)

2ff4d03

Co-authored-by: WX\shixi <[email protected]>

richerfu pushed a commit to richerfu/zig that referenced this pull request Oct 28, 2024

std.ascii: make toLower toUpper branchless (ziglang#21369)

01048af

Co-authored-by: WX\shixi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`std.ascii`: make `toLower` `toUpper` branchless #21369

`std.ascii`: make `toLower` `toUpper` branchless #21369

CrazyboyQCD commented Sep 10, 2024 •

edited

Loading

alexrp commented Sep 10, 2024

CrazyboyQCD commented Sep 10, 2024

Rexicon226 commented Sep 10, 2024

CrazyboyQCD commented Sep 10, 2024

Rexicon226 commented Sep 10, 2024

Rexicon226 left a comment

zeroZshadow commented Sep 10, 2024

Rexicon226 commented Sep 10, 2024

MatthiasPortzel commented Sep 10, 2024 •

edited

Loading

Rexicon226 commented Sep 10, 2024

MatthiasPortzel commented Sep 11, 2024 •

edited

Loading

Rexicon226 commented Sep 11, 2024

T-727 commented Sep 12, 2024

SilasLock commented Sep 12, 2024 •

edited

Loading

andrewrk left a comment

Rexicon226 commented Sep 13, 2024

daurnimator Sep 16, 2024

pancelor Sep 18, 2024

std.ascii: make toLower toUpper branchless #21369

std.ascii: make toLower toUpper branchless #21369

Conversation

CrazyboyQCD commented Sep 10, 2024 • edited Loading

alexrp commented Sep 10, 2024

CrazyboyQCD commented Sep 10, 2024

Rexicon226 commented Sep 10, 2024

CrazyboyQCD commented Sep 10, 2024

Rexicon226 commented Sep 10, 2024

Rexicon226 left a comment

Choose a reason for hiding this comment

zeroZshadow commented Sep 10, 2024

Rexicon226 commented Sep 10, 2024

MatthiasPortzel commented Sep 10, 2024 • edited Loading

Rexicon226 commented Sep 10, 2024

MatthiasPortzel commented Sep 11, 2024 • edited Loading

Rexicon226 commented Sep 11, 2024

T-727 commented Sep 12, 2024

SilasLock commented Sep 12, 2024 • edited Loading

andrewrk left a comment

Choose a reason for hiding this comment

Rexicon226 commented Sep 13, 2024

daurnimator Sep 16, 2024

Choose a reason for hiding this comment

pancelor Sep 18, 2024

Choose a reason for hiding this comment

`std.ascii`: make `toLower` `toUpper` branchless #21369

`std.ascii`: make `toLower` `toUpper` branchless #21369

CrazyboyQCD commented Sep 10, 2024 •

edited

Loading

MatthiasPortzel commented Sep 10, 2024 •

edited

Loading

MatthiasPortzel commented Sep 11, 2024 •

edited

Loading

SilasLock commented Sep 12, 2024 •

edited

Loading