Skip to content

Commit be3d1eb

Browse files
authored
Rollup merge of #81856 - Smittyvb:utf16-warn, r=matthewjasper
Suggest character encoding is incorrect when encountering random null bytes This adds a note whenever null bytes are seen at the start of a token unexpectedly, since those tend to come from UTF-16 encoded files without a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) (if a UTF-16 BOM appears it won't be valid UTF-8, but if there is no BOM it be both valid UTF-16 and valid but garbled UTF-8). This approach was suggested in #73979 (comment). Closes #73979.
2 parents 94736c4 + ed8c686 commit be3d1eb

8 files changed

+3
-0
lines changed

compiler/rustc_parse/src/lexer/mod.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,9 @@ impl<'a> StringReader<'a> {
268268
// tokens like `<<` from `rustc_lexer`, and then add fancier error recovery to it,
269269
// as there will be less overall work to do this way.
270270
let token = unicode_chars::check_for_substitution(self, start, c, &mut err);
271+
if c == '\x00' {
272+
err.help("source files must contain UTF-8 encoded text, unexpected null bytes might occur when a different encoding is used");
273+
}
271274
err.emit();
272275
token?
273276
}

src/test/ui/parser/issue-66473.stderr

2.54 KB
Binary file not shown.

src/test/ui/parser/issue-68629.stderr

390 Bytes
Binary file not shown.

src/test/ui/parser/issue-68730.stderr

260 Bytes
Binary file not shown.
125 Bytes
Binary file not shown.
3.45 KB
Binary file not shown.
126 Bytes
Binary file not shown.
3.42 KB
Binary file not shown.

0 commit comments

Comments
 (0)