Skip to content

std.ascii isPunct returns true for several symbols #8419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jecolon opened this issue Apr 2, 2021 · 11 comments
Closed

std.ascii isPunct returns true for several symbols #8419

jecolon opened this issue Apr 2, 2021 · 11 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@jecolon
Copy link

jecolon commented Apr 2, 2021

According to https://en.wikipedia.org/wiki/ASCII#Character_set , the following characters are symbols, not punctuation:

'`', '~', '$', '^', '=', '+', '|', '<', '>'

It so happens that this is the case in Unicode too. In std.ascii, there's no isSymbol so these characters are covered by isPunct which differs from both ASCII and Unicode. I have a working version at https://github.com/jecolon/ziglyph/blob/main/src/ascii.zig but don't know if I should make a PR with it or let it be handled via this issue. That version adds isSymbol, with corresponding symbol table and tIndex.Symbol enum value.

@g-w1
Copy link
Contributor

g-w1 commented Apr 2, 2021

Feel free to make a pr, or I can.

@jecolon
Copy link
Author

jecolon commented Apr 2, 2021

Ok, I'll do the PR!

@daurnimator
Copy link
Contributor

ispunct is defined by C and posix as:

checks for any printable character which is not a space or an alphanumeric character.

zig follows this convention, which is inherited by many formats and protocols

@jecolon
Copy link
Author

jecolon commented Apr 3, 2021

Interesting. So isPunct is C and POSIX compliant , not ASCII compliant, but it's in std.ascii . Maybe there should be an asciiIsPunct or isPunctAscii that's aligned with ASCII then?

@Mouvedia
Copy link

Mouvedia commented Apr 4, 2021

If #5019 was accepted you could have a std.posix.isPunct and a std.ascii.isPunct.

@jecolon
Copy link
Author

jecolon commented Apr 4, 2021

If #5019 was accepted you could have a std.posix.isPunct and a std.ascii.isPunct.

That would definitely be a good thing; compliant, not misleading, less confusing.

@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library. labels May 19, 2021
@andrewrk
Copy link
Member

Counter-proposal: delete ascii.isPunct and also don't define it in std.os or std.posix

@andrewrk andrewrk added this to the 0.10.0 milestone May 19, 2021
@jecolon
Copy link
Author

jecolon commented May 20, 2021

Counter-proposal: delete ascii.isPunct and also don't define it in std.os or std.posix

Yeah, that could be the best option to avoid confusion having multiple isPunct functions in different places.

@wooster0
Copy link
Contributor

As of #12448 this function is deprecated so I think this issue can be closed:

zig/lib/std/ascii.zig

Lines 355 to 358 in 99c3578

/// DEPRECATED: create your own function based on your needs and what you want to do.
pub fn isPunct(c: u8) bool {
return inTable(c, tIndex.Punct);
}

It will be removed eventually along with the other stuff.

@jecolon
Copy link
Author

jecolon commented Oct 16, 2022

As @r00ster91 points out, this is now a non-issue.

@jecolon jecolon closed this as completed Oct 16, 2022
@andrewrk andrewrk modified the milestones: 0.12.0, 0.10.0 Oct 17, 2022
@mnemnion
Copy link

It seems this issue was decided without reference to the standard; Wikipedia is not authoritative.

According to the 2007 revision of the 1986 standard, there is an unambiguous term for characters which are not control, space, or alphanumeric: special characters. I would eat a hat live on camera if the 2017 revision changed that, but don't care to spend $60 to find out for sure.

The C and POSIX function ispunct identifies every character that the standard considers "special", and none which it does not. Therefore, a std.ascii.isPunct function identifying the same characters would also be referring to a cogent and unambiguous group of symbols within the standard. If it were desirable to avoid conflict with Unicode over the punctuation vs. symbol distinction (I can see a case for that), then std.ascii.isSpecial would be apropos, using the precise terminology referenced in the ASCII standard.

Eliminating a source of ambiguity between Unicode and ASCII was and is a reasonable choice, but it's left the predicates in std.ascii with a hole: there is a collection of characters which cannot be affirmatively classified, something users coming from C would certainly expect. I propose std.ascii.isSpecial, recognizing the exact set of characters as ispunct in the C standard, those referred to as special characters in the ASCII standard, be added to fill in that gap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

7 participants