Skip to content

std.ascii: rename functions and other improvements #12448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Oct 15, 2022

Conversation

wooster0
Copy link
Contributor

@wooster0 wooster0 commented Aug 14, 2022

Supersedes #11629

This should be a lot easier to review than #11629 because it does a lot less.
It mainly renames some functions and removes 3. It also adds a lot more tests.
See the commit descriptions for the reasons of all the deprecations and renamings.
They're all just deprecated so it will still work in 0.10 (if this gets into 0.10). After that they'll be removed.

You might also be interested in Rust's naming decisions: https://doc.rust-lang.org/std/primitive.char.html#method.is_ascii

It lowercases all constants and documents them.
0x1F (`control_code.us`) itself is also a control code.
I think `isBlank` and `isWhitespace` are quite confusable.
What `isBlank` does is so simple that you can just do the `c == ' ' or c == '\t'`
check yourself but in a lot of cases you don't even want that.
`std.ascii` can't really know what you think "blank" means.
That's why I think it's better to remove it.
And again, it seems ambiguous considering that we have `isWhitespace`.

Next, it also deprecates `isGraph`.
It's the same as `isPrint(c) and c != ' '`, which I find confusing.
When something is printable, you can say it also has a *graph*ical representation.
Removing `isGraph` solves this possible confusion.
`isAlNum` and `isAlpha`:
1. I think these names are a bit cryptic.
2. `isAlpha` is a bit ambiguous: is it alpha*numeric* or alpha*betic*?
   This is why I renamed `isAlpha` to `isAlphabetic`.
3. For consistency and because `isAlNum` looks weird, I renamed it to `isAlphanumeric`.

`isCntrl`:
1. It's cryptic and hard to find when you look for it.
2. We don't save a lot of space writing it this way.
3. It's closer to the name of the `control_code` struct.

`isSpace`:
1. The name is ambiguous and misleading.

`spaces`:
1. Ditto

`isXDigit`:
1. The name is extremely cryptic.
2. The function is very hard to find by its name.
And improve some existing docs.
@topolarity
Copy link
Contributor

Thanks for the improved doc comments and tidying things up here 👍

Can you remind me which of these deprecations were decided previously, and which are new deprecations you're recommending?

@wooster0
Copy link
Contributor Author

Thanks for the reviews! Missed some stuff there.

Can you remind me which of these deprecations were decided previously, and which are new deprecations you're recommending?

Hmm, none of my suggestions have really been "officially" discussed like that.
You can take it all as new suggestions except for the removal of isPunct which Andrew wanted too.
I tried to go with names that are not cryptic, readable, and yet short enough.
For inspiration I also looked at Rust's naming choices (if you scroll down on https://doc.rust-lang.org/std/primitive.char.html#method.is_ascii).

Obviously this is not set in stone and we're still far from 1.0 but this is a start and should make it less painful to do these deprecations later.

Copy link
Contributor

@topolarity topolarity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I think the new names are much clearer

The only bikeshedding I might encourage would be whether isGraph should be kept instead of isPrint. Since isPrint(c) == !isControl(c), it seems like isGraph would be the more useful one to keep around

I could be convinced either way, though

@wooster0
Copy link
Contributor Author

wooster0 commented Aug 17, 2022

@topolarity I think isGraph's name is confusing because ' ' too has a graphical representation. I think having the two isPrint and isControl makes code easier to read and to me it feels a bit more uniform this way.

In general I feel like just knowing whether a character is printable or not is all that most users need. When do you actually need to know whether it's printable and not a space?

If we look at https://github.com/search?l=zig&p=2&q=isgraph&type=Code I can actually find many misuses of it where people indeed just wanted to know if it's printable or not:

                if (std.ascii.isGraph(char) or char == ' ') {
                            if (!std.ascii.isGraph(c.?) and !std.ascii.isSpace(c.?)) {

And as for the other results it's mostly just them having a copy of the std in their project somehow and the rest are probably more misuses, and finally maybe there are a few legit uses.

I could be wrong though; maybe we can re-add it in the future.

@wooster0
Copy link
Contributor Author

Is there anything blocking this from getting merged? Would be great to have this in before 0.10. Updating the rest of the codebase to use the new functions can come later.

@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. labels Oct 15, 2022
@andrewrk andrewrk merged commit c0d7f64 into ziglang:master Oct 15, 2022
@andrewrk
Copy link
Member

Thanks @r00ster91!

Is there anything blocking this from getting merged?

No, I just needed to find some time to look at it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants