-
Notifications
You must be signed in to change notification settings - Fork 180
Normalize all identifiers to NFC #2489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a6273d2
to
6c8bf02
Compare
// Input source wrapper thing. | ||
class InputSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class has been moved to new file rust-input-source.h
to avoid recursive #include
static tl::optional<Utf8String> | ||
make_utf8_string (const std::string &maybe_utf8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a factory function for Utf8String
.
It returns an optional type value. Returns non-null value if a give std::string
is properly encoded as UTF-8.
fn main() { | ||
// U+304C | ||
let が = (); | ||
// U+304B + U+3099 | ||
let _ = が; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code compiles despite that these two identifiers has different byte string.
It means identifier normalization seems to work.
6c8bf02
to
35b67c3
Compare
gcc/rust/util/rust-unicode.h
Outdated
|
||
return buf; | ||
}; | ||
|
||
// Returns UTF codepoints when string is valid as UTF-8, returns nullopt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this comment be updated ? Looks like the function signature has changed and does not reflect it's description.
gcc/rust/util/rust-unicode.cc
Outdated
@@ -309,9 +318,10 @@ is_numeric (uint32_t codepoint) | |||
namespace selftest { | |||
|
|||
void | |||
assert_normalize (std::vector<uint32_t> origin, std::vector<uint32_t> expected) | |||
assert_normalize (std::vector<Rust::Codepoint> origin, | |||
std::vector<Rust::Codepoint> expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected could be const. Also, we should probably take some references here.
gcc/rust/ChangeLog: * lex/rust-lex.cc (assert_source_content): Fix namespace specifier (test_buffer_input_source): Likewise. (test_file_input_source): Likewise. * lex/rust-lex.h: Move InputSource ... * lex/rust-input-source.h: ... to here. (New file) * lex/rust-token.cc (nfc_normalize_token_string): New function * lex/rust-token.h (nfc_normalize_token_string): New function * rust-lang.cc (run_rust_tests): Modify order of selftests. * rust-session-manager.cc (validate_crate_name): Modify interface of Utf8String. * util/rust-unicode.cc (lookup_cc): Modify codepoint_t typedef. (lookup_recomp): Likewise. (recursive_decomp_cano): Likewise. (decomp_cano): Likewise. (sort_cano): Likewise. (compose_hangul): Likewise. (assert_normalize): Likewise. (Utf8String::nfc_normalize): New function. * util/rust-unicode.h: Modify interface of Utf8String. gcc/testsuite/ChangeLog: * rust/compile/unicode_norm1.rs: New test. Signed-off-by: Raiki Tamura <[email protected]>
35b67c3
to
2fa4f4a
Compare
@P-E-P Thank you for your review. Fixed all. |
Addresses #2287
depends on #2467Normalize all identifiers (tokens) to their NFC form.
Normalization must be done before any macro expansion.
See https://doc.rust-lang.org/reference/identifiers.html#normalization for details
Changelog