-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
std.ascii: remove LUT and deprecations #13370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
327403a
to
e4e47d1
Compare
I think I found something: /// Returns whether the character is printable and has some graphical representation.
/// This also returns `true` for the space character.
/// This is the same as `!isControl(c)`.
pub fn isPrint(c: u8) bool {
This was not true in the lookup-table implementation. |
I'm not talking about the spaces. pub fn isPrint(c: u8) bool {
return inTable(c, tIndex.Graph) or c == ' ';
} // ^ Spaces returned true in the old code The problem is non-ascii characters( And I think it makes sense to consider non-ascii characters as not printable. For example in UTF-8 all non-ascii characters consist of multiple bytes, so a single byte of that sequence cannot be printed. Your changes prevent both |
0989cff
to
a0b58b0
Compare
This makes it so that we no longer use a LUT (Look-Up Table): * The code is much simpler and easier to understand now. * Using a LUT means we rely on a warm cache. Relying on the cache like this results in inconsistent performance and in many cases codegen will be worse. Also as @topolarity once pointed out, in some cases while it seems like the code may branch, it actually doesn't: ziglang#11629 (comment) * Other languages' standard libraries don't do this either. JFF I wanted to see what other languages codegen compared to us now: https://rust.godbolt.org/z/Te4ax9Edf, https://zig.godbolt.org/z/nTbYedWKv So we are pretty much on par or better than other languages now.
On x86 interestingly I can see a reduction in codesize by 1 instruction with this. While not necessarily faster, it might still reduce codesize a bit and this ordering is also more logical because it follows ASCII table order. Rust's std uses this ordering too. See https://zig.godbolt.org/z/PqodY8YqY for the difference.
This makes the function consider space to be printable as well (because it is) and simplifies that function.
I wrote this after #12448 got merged but decided to wait for 0.10.0 to be released before pulling it. Now I've decided to pull it a bit earlier since 0.10.0 is just about to be released, anyway (and in case someone else would accidentally start working on the same thing or something).
This should be merged after 0.10.0 and removes all deprecations, updates the codebase, does some other cleanups etc., and also removes the LUT (Look-Up Table): this was already discussed quite a bit in #11629 (see #11629 (comment) etc.) and I've also discussed this with @topolarity in more detail after that, but basically the reason is that the LUT made us depend on cache performance. It also reduces binary size. See also the commit's description for more detail.
Removing both the deprecations and the LUT at the same time is a very convenient timing, by the way. It shouldn't make review that much harder, I hope.